Understanding
Programming Language
Part 3

IDE Development Course

Andrew Vasilyev

Today's Agenda

  • Syntax Directed Translation
  • Introduction to Type Systems and Type Checking

Syntax Directed Translation

Syntax Directed Translation

A method to analyze or translate code from one language to another, like to machine codes, involves linking a specific action to each grammar rule. When the parser uses this rule, the action is carried out.

This can be done during the initial analysis of the code, but it's easier to first make an Abstract Syntax Tree (AST), then go through it, doing the linked action for each part of the tree based on its type.

Also, changing a Concrete Syntax Tree (CST) to an AST is same method.

Attributes

Attributes are the properties assigned to the nodes of the syntax tree, used in Syntax Directed Translation for semantic analysis.

Annotated AST = AST + Attributes

Inherited Attributes

Inherited attributes are computed from the attribute values of their parent and/or sibling nodes. (top and siblings -> bottom)

Inherited Attributes

							
								int x = 10;
								{
									int y = x + 5;
								}
							
						
								%%{
									init: {
										'theme': 'base',
										'themeVariables': {
											'fontSize': '24px',
											'darkmode': true,
											'lineColor': '#F8B229'
										}
									}
								}%%

								graph TD;
								A["Program (Global)"]
								B["Declaration: int x = 10"]
								C["Block (Block 1)"]
								D["Declaration: int y = x + 5"]
								E["Expression: x + 5"]
								F["Variable: x"]
								G["Constant: 5"]
							
								A --> B
							
								%% Inherited Attributes
								B -->|Scope: Global| C
								C -->|Scope: Block 1| D
								D -->|Scope: Block 1| E
								E -->|Scope: Block 1| F
								E -->|Scope: Block 1| G
							

Synthesized Attributes

Synthesized attributes are computed from the attribute values of their child nodes (bottom -> top).

Synthesized Attributes

							
								3 + 4 * 5
							
						
								%%{
									init: {
										'theme': 'base',
										'themeVariables': {
											'fontSize': '24px',
											'darkmode': true,
											'lineColor': '#F8B229'
										}
									}
								}%%
								graph TD;
								A["Expression"]
								B["Addition"]
								C["3 (int)"]
								D["Multiplication"]
								E["4 (int)"]
								F["5 (int)"]
														
								%% Synthesized Attributes
								C -->|"Type: int"| B
								E -->|"Type: int"| D
								F -->|"Type: int"| D
								D -->|"Type: int"| B
								B -->|"Type: int"| A
							

How to implement SDT?

Basically, SDT is depth first traversal of AST.

Calculate and save inherited attributes while moving bottom.

Calculate and save synthesized attributes while moving top.

Better approach - "Visitor" pattern

The Visitor Pattern separates an algorithm from an object structure on which it operates, allowing for the execution of operations on objects without changing their classes.

By implementing a visitor, you can define new operations on the AST nodes, which is crucial for SDT. Each node in the AST can accept a visitor, which carries out the operation defined for that node's type.

Advantages of Using Visitor Pattern
  • Enhances flexibility by allowing new operations on AST nodes without modifying them.
  • Separates concerns, keeping the AST structure and operations on it decoupled.
  • Centralizes the operation logic, making the code easier to understand and maintain.

Example: Simple Interpreter

					
						sealed class Node {
							abstract fun accept(visitor: Visitor): Int
						}
						
						data class IntegerNode(val value: Int) : Node() {
							override fun accept(visitor: Visitor): Int {
								return visitor.visit(this)
							}
						}
						
						data class AdditionNode(val left: Node, val right: Node) : Node() {
							override fun accept(visitor: Visitor): Int {
								return visitor.visit(this)
							}
						}
						
						data class MultiplicationNode(val left: Node, val right: Node) : Node() {
							override fun accept(visitor: Visitor): Int {
								return visitor.visit(this)
							}
						}
						
						interface Visitor {
							fun visit(node: IntegerNode): Int
							fun visit(node: AdditionNode): Int
							fun visit(node: MultiplicationNode): Int
						}
						
						class EvaluationVisitor : Visitor {
							override fun visit(node: IntegerNode): Int {
								return node.value
							}
						
							override fun visit(node: AdditionNode): Int {
								return node.left.accept(this) + node.right.accept(this)
							}
						
							override fun visit(node: MultiplicationNode): Int {
								return node.left.accept(this) * node.right.accept(this)
							}
						}
						
						fun main() {
							val expression = AdditionNode(
								IntegerNode(3),
								MultiplicationNode(
									IntegerNode(4),
									IntegerNode(5)
								)
							)
						
							val visitor = EvaluationVisitor()
							val result = expression.accept(visitor)
							println(result)  // Output: 23
						}
					
				

Example: Interpreter with Name and Scope Resolution

					
						data class SymbolTable(val table: MutableMap = mutableMapOf(), val parent: SymbolTable? = null) {
							fun nest(): SymbolTable {
								return SymbolTable(table.toMutableMap(), this)
							}
						
							fun set(name: String, value: Int) {
								table[name] = value
							}
						
							operator fun get(name: String): Int? {
								return table[name] ?: parent?.get(name)
							}
						}
						
						sealed class Node {
							abstract fun accept(visitor: Visitor, symbolTable: SymbolTable): Int
						}
						
						data class BlockNode(val statements: List) : Node() {
							override fun accept(visitor: Visitor, symbolTable: SymbolTable): Int {
								return visitor.visit(this, symbolTable)
							}
						}
						
						data class IntegerNode(val value: Int) : Node() {
							override fun accept(visitor: Visitor, symbolTable: SymbolTable): Int {
								return visitor.visit(this, symbolTable)
							}
						}
						
						data class VariableNode(val name: String) : Node() {
							override fun accept(visitor: Visitor, symbolTable: SymbolTable): Int {
								return visitor.visit(this, symbolTable)
							}
						}
						
						data class DeclarationNode(val name: String, val expression: Node) : Node() {
							override fun accept(visitor: Visitor, symbolTable: SymbolTable): Int {
								return visitor.visit(this, symbolTable)
							}
						}
						
						data class AdditionNode(val left: Node, val right: Node) : Node() {
							override fun accept(visitor: Visitor, symbolTable: SymbolTable): Int {
								return visitor.visit(this, symbolTable)
							}
						}
						
						data class MultiplicationNode(val left: Node, val right: Node) : Node() {
							override fun accept(visitor: Visitor, symbolTable: SymbolTable): Int {
								return visitor.visit(this, symbolTable)
							}
						}
						
						interface Visitor {
							fun visit(node: BlockNode, symbolTable: SymbolTable): Int
							fun visit(node: IntegerNode, symbolTable: SymbolTable): Int
							fun visit(node: VariableNode, symbolTable: SymbolTable): Int
							fun visit(node: DeclarationNode, symbolTable: SymbolTable): Int
							fun visit(node: AdditionNode, symbolTable: SymbolTable): Int
							fun visit(node: MultiplicationNode, symbolTable: SymbolTable): Int
						}
						
						class EvaluationVisitor : Visitor {
							override fun visit(node: BlockNode, symbolTable: SymbolTable): Int {
								var lastResult = 0
								var nestedSymbolTable = symbolTable.nest()
								for (statement in node.statements) {
									lastResult = statement.accept(this, nestedSymbolTable)
								}
								return lastResult
							}
						
							override fun visit(node: IntegerNode, symbolTable: SymbolTable): Int {
								return node.value
							}
						
							override fun visit(node: VariableNode, symbolTable: SymbolTable): Int {
								return symbolTable[node.name] ?: error("Undefined variable: ${node.name}")
							}
						
							override fun visit(node: DeclarationNode, symbolTable: SymbolTable): Int {
								val value = node.expression.accept(this, symbolTable)
								symbolTable.set(node.name, value)
								return value
							}
						
							override fun visit(node: AdditionNode, symbolTable: SymbolTable): Int {
								return node.left.accept(this, symbolTable) + node.right.accept(this, symbolTable)
							}
						
							override fun visit(node: MultiplicationNode, symbolTable: SymbolTable): Int {
								return node.left.accept(this, symbolTable) * node.right.accept(this, symbolTable)
							}
						}
						
						fun main() {
							val program = BlockNode(
								listOf(
									DeclarationNode("x", IntegerNode(3)),
									BlockNode(
										listOf(
											DeclarationNode("y", IntegerNode(4)),
											BlockNode(
												listOf(
													DeclarationNode("z", AdditionNode(VariableNode("x"), VariableNode("y")))
												)
											)
										)
									)
								)
							)
						
							val visitor = EvaluationVisitor()
							val result = program.accept(visitor, SymbolTable())
							println(result)  // Output: 7
						}
					
				

Type Systems and Type Checking

What is Type?

Type is:
A set of values
A set of operations defined on these values

A type error occurs when an operation is performed on a value for which that operation is not defined.

Type Systems

A type system is a collection of rules that govern how operations on values are determined. It plays a pivotal role in ensuring that the program behaves as expected by restricting the operations that can be performed on different types of data.

Type Systems

The method of type checking is a procedure where the type system verifies the correctness of the program by checking whether the operations performed on values are allowed based on their types. It can catch type errors before the program is run, which aids in debugging and ensuring the program's reliability.

Type safety is a characteristic of a programming language that ensures type errors are either prevented or detected, providing a layer of reliability and predictability in the code. It ensures that operations performed are semantically correct according to the type system rules.

Type Systems

Memory safety is a feature that prevents programs from accessing memory outside of their allocated space, which could lead to unpredictable behavior or security vulnerabilities. A strong type system can aid in achieving memory safety by enforcing strict rules on data operations.

The implementation of polymorphism is a way that allows values to be treated as instances of multiple types. Through polymorphism, different types can share the same interface, enabling a unified way of accessing a variety of data types, which can simplify code and promote reusability and flexibility.

Kinds of Typing

Typeless: Types are not checked.
Static Typing: Types are checked at compile time.
Dynamic Typing: Types are checked at runtime.

Static vs Dynamic Typing

Aspect Typeless Static Typing Dynamic Typing
Error Detection No error detection Early detection at compile-time Late detection at run-time
Debugging Very hard Easy Harder
Performance Best Better May be slower due to runtime checks
Code Verbosity Depends More verbose Less verbose
Static analysis Very hard Easy Very hard

Type Coercion

Type coercion is the process of converting a value from one type to another.

					
						var a = "42";
						var b = a * 1; // "42" is coerced to 42
					
				

Type Punning

Type punning is reinterpreting the underlying bit representation of a value as a value of a different type.

					
						union {
						float f;
						int i;
						} pun;

						pun.f = 3.14;
						int pi_approx = pun.i; // Type punning
					
				

Strong vs Weak Typing

Strong and weak typing refer to how strictly types are enforced in a programming language.

Strong Typing:
Types are enforced strictly.
Type errors are caught before runtime.
Examples: Java, C++, Rust.

Weak Typing:
Types are enforced more loosely.
Type coercion allows operations between mismatched types.
Examples: JavaScript, PHP.

Static vs Runtime Type Checking

Strong and weak typing refer to how strictly types are enforced in a programming language.

Static Typing Checking:
Performed at compile-time.
Catches type errors before program runs.

Dynamic Typing Checking:
Performed as program executes.
Type errors detected at runtime, can lead to runtime exceptions.

Null, the Million Dollar Mistake

Null pointers and null references are common concepts in programming, representing a lack of value or reference. However, they can lead to runtime errors if not handled properly. Tony Hoare, who introduced null references, later termed it a "billion dollar mistake" due to the myriad of bugs it led to.
More

Questions:

What is the semantic of this "value"? Why do different types allow the assignment of the same "value"?
What operations are applicable to this "value"?
How can we statically check whether an operation is applicable to a "value"?

Null, the Million Dollar Mistake

Workarounds:

Null Object Pattern: Utilize a null object that encapsulates the absence of a value or object, yet still conforms to the expected interface, thereby avoiding null reference errors.

Optional Types (Optional<>): Introduce optional types that clearly indicate the possibility of absence of a value, making the code more self-explanatory and safe.

Nullable Reference Types (as seen in C# and Kotlin): Employ nullable reference types that must be explicitly declared, making it clear when a reference could be null and requiring developers to handle the null case.

Type-Safe Systems

Type-safe systems help prevent type errors, making code more reliable and easier to maintain. They enforce rules about how different types of data can be used, catching mistakes before they cause problems.

  • Error Prevention: Catch type errors early, reducing bugs.
  • Code Clarity: Make code more readable and understandable by enforcing clear rules on data usage.
  • Performance Optimizations: Enable better compiler optimizations for improved runtime performance.
  • Examples: Strongly typed languages like Java, C++, and Rust implement type safety, while TypeScript brings type safety to JavaScript.

Implementing Static Type Checking

  • Parsing: Create Abstract Syntax Tree (AST) from source code.
  • Symbol Table: Track identifiers and their types.
  • Semantic Analysis: Traverse AST, ensure operations are performed on compatible types.
  • Type Inference: Deduce types based on context (if applicable).
  • Error Reporting: Report type errors with meaningful messages.
  • Type Annotation: Annotate AST with type information for later compilation stages.

Example: Interpreter with Type Checking

					
						sealed class Type
						object IntType : Type()
						object StringType : Type()

						data class SymbolTable(
							val table: MutableMap> = mutableMapOf(),
							val parent: SymbolTable? = null
						) {
							fun nest(): SymbolTable {
								return SymbolTable(table.toMutableMap(), this)
							}

							fun set(name: String, type: Type, value: Any) {
								table[name] = Pair(type, value)
							}

							operator fun get(name: String): Pair? {
								return table[name] ?: parent?.get(name)
							}
						}

						sealed class Node {
							abstract fun accept(visitor: Visitor, symbolTable: SymbolTable): Any
						}

						data class BlockNode(val statements: List) : Node() {
							override fun accept(visitor: Visitor, symbolTable: SymbolTable): Any {
								return visitor.visit(this, symbolTable)
							}
						}

						data class IntegerNode(val value: Int) : Node() {
							override fun accept(visitor: Visitor, symbolTable: SymbolTable): Any {
								return visitor.visit(this, symbolTable)
							}
						}

						data class StringNode(val value: String) : Node() {
							override fun accept(visitor: Visitor, symbolTable: SymbolTable): Any {
								return visitor.visit(this, symbolTable)
							}
						}

						data class VariableNode(val name: String) : Node() {
							override fun accept(visitor: Visitor, symbolTable: SymbolTable): Any {
								return visitor.visit(this, symbolTable)
							}
						}

						data class DeclarationNode(val name: String, val expression: Node, val type: Type) : Node() {
							override fun accept(visitor: Visitor, symbolTable: SymbolTable): Any {
								return visitor.visit(this, symbolTable)
							}
						}

						data class AdditionNode(val left: Node, val right: Node) : Node() {
							override fun accept(visitor: Visitor, symbolTable: SymbolTable): Any {
								return visitor.visit(this, symbolTable)
							}
						}

						interface Visitor {
							fun visit(node: BlockNode, symbolTable: SymbolTable): Any
							fun visit(node: IntegerNode, symbolTable: SymbolTable): Any
							fun visit(node: StringNode, symbolTable: SymbolTable): Any
							fun visit(node: VariableNode, symbolTable: SymbolTable): Any
							fun visit(node: DeclarationNode, symbolTable: SymbolTable): Any
							fun visit(node: AdditionNode, symbolTable: SymbolTable): Any
						}

						class TypeCheckingVisitor  : Visitor {
							override fun visit(node: BlockNode, symbolTable: SymbolTable): Any {
								var lastResult: Any = 0
								var lastType: Type = IntType
								var nestedSymbolTable = symbolTable.nest()
								for (statement in node.statements) {
									(lastResult, lastType) = statement.accept(this, nestedSymbolTable) as Pair
								}
								return Pair(lastResult, lastType)
							}

							override fun visit(node: IntegerNode, symbolTable: SymbolTable): Any {
								return Pair(node.value, IntType)
							}

							override fun visit(node: StringNode, symbolTable: SymbolTable): Any {
								return Pair(node.value, StringType)
							}

							override fun visit(node: VariableNode, symbolTable: SymbolTable): Any {
								val (type, value) = symbolTable[node.name] ?: error("Undefined variable: ${node.name}")
								return Pair(value, type)
							}

							override fun visit(node: DeclarationNode, symbolTable: SymbolTable): Any {
								val (value, type) = node.expression.accept(this, symbolTable) as Pair
								when (type) {
									is IntType -> if (node.type != IntType) error("Type mismatch: expected IntType, got ${value::class}")
									is StringType -> if (node.type != StringType) error("Type mismatch: expected StringType, got ${value::class}")
								}
								symbolTable.set(node.name, node.type, value)
								return Pair(value, type)
							}

							override fun visit(node: AdditionNode, symbolTable: SymbolTable): Any {
								val (leftValue, leftType) = node.left.accept(this, symbolTable) as Pair
								val (rightValue, rightType) = node.right.accept(this, symbolTable) as Pair
								return when {
									leftType is IntType && rightType is IntType -> Pair(leftValue as Int + rightValue as Int, IntType)
									leftType is StringType && rightType is StringType -> Pair(leftValue as String + rightValue as String, StringType)
									else -> error("Type mismatch: cannot add $leftValue and $rightValue")
								}
							}
						}

						fun main() {
							val program = BlockNode(
								listOf(
									DeclarationNode("x", IntegerNode(3), IntType),
									DeclarationNode("y", StringNode("hello"), StringType),
									BlockNode(
										listOf(
											DeclarationNode("z", AdditionNode(VariableNode("x"), IntegerNode(5)), IntType),
											DeclarationNode("w", AdditionNode(VariableNode("y"), StringNode(" world")), StringType)
										)
									)
								)
							)
						
							val typeChecker = TypeCheckingVisitor()
							val (result, type) = program.accept(typeChecker, SymbolTable()) as Pair
							println(result)  // Output: hello world
						}
				
				

Conclusion

Programming Language Processing in IDE - Updated

						%%{
							init: {
								'theme': 'base',
								'themeVariables': {
									'fontSize': '30px',
									'darkmode': true,
									'lineColor': '#F8B229'
								}
							}
						}%%
						graph LR
						A[Code] --> B[Lexical Analysis]
						B --> C[Tokens]
						C --> D[Syntax Analysis]
						D --> E[AST]
						E --> F[Semantic Analysis]
						F --> G[Annotated and checked AST]
						G --> I[???]
						I --> M[PSI]
					

What to read?

Types and Programming Languages
by Benjamin C. Pierce

Next: Understanding Programming Language Part 4

Dive into advanced topics to further enhance the robustness and correctness of your code:

  • Type Inference: Discover how compilers deduce types of variables and expressions, reducing the need for explicit type annotations, while maintaining type safety.
  • Formal Verification: Explore methods to mathematically prove the correctness of your code, ensuring it behaves as expected under all conditions.
  • Static Analysis: Learn about tools and techniques to analyze your code's behavior without executing it, identifying potential bugs and vulnerabilities early in the development process.

Questions & Answers

Thank you for your attention!

I'm now open to any questions you might have.