Understanding Programming Language Part 4

IDE Development Course

Andrew Vasilyev

Today's Agenda

Polymorphism
Type Inference
Introduction to Static Analysis

Polymorphism and Type Systems

Polymorphism

Polymorphism is a concept in programming that refers to the ability of different classes to respond to the same function call or method invocation, each in their own class-specific way. This enables a single interface to interact with objects of different types.

The interaction between polymorphism and type systems involves how polymorphic features are handled within type hierarchies, and how overloading, overriding, and templating are managed.

Static vs Dynamic Polymorphism

Static polymorphism, also known as compile-time polymorphism, resolves method calls at compile time. Dynamic polymorphism, or runtime polymorphism, resolves method calls at runtime.


			// Static polymorphism example in Java
			class MathOperations {
				public int multiply(int a, int b) {
					return a * b;
				}

				// Overloaded method with different signature
				public double multiply(double a, double b) {
					return a * b;
				}
			}

			// Dynamic polymorphism example in Java
			class Animal {
				public void sound() {
					System.out.println("Animal makes a sound");
				}
			}

			class Dog extends Animal {
				public void sound() {
					System.out.println("Dog barks");
				}
			}

			Animal animal = new Dog();
			animal.sound(); // Outputs: Dog barks

Ad-hoc Polymorphism

Ad-hoc polymorphism allows a single function symbol to have multiple implementations, with the compiler choosing the appropriate implementation based on the types of the arguments.


			// Ad-hoc polymorphism example in Java (Method Overloading)
			class MathOperations {
				public int add(int a, int b) {
					return a + b;
				}

				public double add(double a, double b) {
					return a + b;
				}

				public String add(String a, String b) {
					return a + b;
				}
			}

Parametric Polymorphism

Parametric polymorphism enables the use of the same code with different types, commonly implemented using templates or generics.


			// Parametric polymorphism example in Java (Generics)
			class Box {
				private T t;

				public void set(T t) {
					this.t = t;
				}

				public T get() {
					return t;
				}
			}

			Box integerBox = new Box();
			Box stringBox = new Box();

Coercion Polymorphism

Coercion polymorphism refers to the automatic conversion of a value from one data type to another. It allows different types to be treated as the same type in certain contexts, often occurring implicitly in many programming languages.


			// Coercion polymorphism example in Java
			public class CoercionExample {
				public static void main(String[] args) {
					int integerNumber = 42;
					// Automatic conversion from int to double
					double doubleNumber = integerNumber;

					System.out.println(doubleNumber); // Outputs: 42.0
				}
			}

In this example, the integer value is automatically converted (coerced) to a double when assigned to a variable of type double.

Type Inference

What is Type Inference?

Type inference is the automatic detection of the data type of an expression in a programming language without explicit type annotations by the programmer.

Type-checking verifies if a given expression matches a type, while type inference deduces the type of an untyped expression.

Type inference supports polymorphism by deducing the most general type, allowing the same code to be used with different data types and promoting code reuse.

Type Inference in Action

Type inference involves deducing the type of an expression from the context in which it appears.

					
						public void Foo(double value)
						{
							Console.WriteLine("double");
						}

						public void Foo(int value)
						{
							Console.WriteLine("int");
						}

						public void Boo()
						{
							// C# uses the 'var' keyword to infer the type of 'total'
							var total = 10 + 5; // 'total' is inferred to be of type int

							Foo(total);        // int
							Foo(total + 2);    // int
							Foo(total + 2.1f); // double
						}

Type Inference Algorithm

1. Initialization

The IDE begins with no assumptions about type and assigns types to literals based on their value, such as inferring `int` for numerical literals.

2. Type Variables

For expressions where the type is not immediately apparent, the IDE uses type variables as placeholders during the inference process.

3. Gathering Type Constraints

The compiler collects constraints for type variables based on how they are used in the code, influencing the final inferred type.

Type Inference Algorithm

4. Unification Process

The unification step attempts to resolve type variables into concrete types by satisfying all gathered constraints without conflicts.

5. Type Checking

Post-unification, the compiler checks for type consistency. Conflicts or unsatisfiable constraints result in type errors.

6. Finalization

Successful type checking leads to the final assignment of concrete types to previously unresolved type variables, completing the inference.

Type Inference Example


			public static (T1, T2) Combine<T1, T2>(T1 first, T2 second)
			{
				return (first, second);
			}

			var combinedIntString = Combine(1, "apple");
			var combinedBoolDouble = Combine(true, 3.14);

Type Inference Example

Type Variables: Compiler introduces T1 and T2 for method parameters.
Type Constraints: Infers T1 as int, T2 as string for combinedIntString; and T1 as bool, T2 as double for combinedBoolDouble.
Unification and Type Checking: Validates that the inferred types are consistent with the method's parameters.
Finalization: Completes type inference, confirming the types of the method calls.

How to implement type inference

Combine with the type-check stage.
Save known types and type constraints in the symbol table.
Collect types and constraints while traversing over the AST from the top to bottom.
Figure out types and ensure constraints are followed, while returning from the bottom to the top of the AST.

Error Handling and Recovery

In type systems, when an error is encountered, a type of "unknown" can be introduced.

Introducing 'Unknown' Type: To continue type checking past an error, an 'unknown' type placeholder is used.
Expression Involving 'Unknown' Types: Any expression that involves an 'unknown' type is also considered 'unknown'.
Error Propagation: This approach prevents further type errors and allows the IDE to continue processing the rest of the code.
Recovery: Later, when more information is available, the 'unknown' types can be replaced with specific types, if possible.

What to learn?

Hindley-Milner Type System

Algorithm W

Formal verification

Derived types

Automated theorem proving

Introduction to Static Analysis

What is Static Analysis?

Static analysis is a process of examining code without running it to find errors, code smells, and security vulnerabilities.

AST Verification

AST verification checks the syntax tree of code for structural and logical correctness. For example, it can flag a potential error if an assignment is made in a conditional statement.


			// Incorrect use of assignment in a conditional
			if (a = 10) { ... } // Possible error
			// Correct use of comparison
			if (a == 10) { ... } // Expected check

Control Flow Analysis

CFA identifies unreachable code and other flow issues. For example, it detects code after a return statement that will never execute.


			function example() {
				return;
				console.log('This will never be called');
			}

Control Flow Analysis

%%{ init: { 'theme': 'base', 'themeVariables': { 'fontSize': '15px', 'darkmode': true, 'lineColor': '#F8B229' } } }%% graph TD; A[Start] --> B{Condition}; B -- Yes --> C[Block A]; C --> D{Another Condition}; D -- Yes --> E[Block B]; D -- No --> F[Block C]; E --> G[End]; F --> G; B -- No --> G;

A Control Flow Graph is a graphical representation of all paths that might be traversed through a program during its execution.

Building a CFG from an AST

Traverse AST for control flow elements (if-else, loops).
Create a node for each control flow element and sequential block.
Connect nodes with edges to show code flow.
Branch decision points to reflect possible execution paths.
Loop back edges at the end of loops to their start.
Mark entry and exit points of the CFG clearly.

Symbolic Execution and Abstract Interpretation

Symbolic execution tests program paths with symbolic inputs, identifying potential errors like division by zero.

Symbolic input is a variable used in program analysis that represents a range of possible values rather than a specific one.


			function divide(x, y) {
				return x / y; // Potential division by zero if y is symbolic and can be zero
			}

Dataflow Analysis

Dataflow analysis tracks variable usage to prevent issues like uninitialized variables.


			let x;
			console.log(x); // Uninitialized variable usage

Interprocedural Analysis

Interprocedural analysis checks interactions across functions, alerting to side effects or unintended usages.


					// Command handler for creating an order
					public class OrderCommandHandler {
						private readonly Database db;

						public OrderCommandHandler(Database database) {
							db = database;
						}

						public void HandleCreateOrder(CreateOrderCommand command) {
							var order = new Order(command.Data);
							db.AddOrder(order);
						}
					}

					// Query handler for fetching order details
					public class OrderQueryHandler {
						private readonly Database db;

						public OrderQueryHandler(Database database) {
							db = database;
						}

						public OrderDetails GetOrderDetails(Guid orderId) {
							return db.GetOrderById(orderId);
						}
					}

					// Interprocedural analysis would validate that OrderCommandHandler only modifies state and does not perform any queries.
					// It would also check that OrderQueryHandler only reads state and does not modify it.

Call Graph

A call graph illustrates the calling relationships between different subroutines in a program.

%%{ init: { 'theme': 'base', 'themeVariables': { 'fontSize': '30px', 'darkmode': true, 'lineColor': '#F8B229' } } }%% graph LR; A[CreateOrder] -->|calls| B[ValidateOrder]; A -->|calls| C[SaveOrder]; B -->|calls| D[CheckInventory]; B -->|calls| E[VerifyPayment]; C -->|calls| F[UpdateDatabase]; C -->|calls| G[SendConfirmation]; D -->|calls| H[ReorderStock]; E -->|calls| I[ProcessPayment]; F -->|calls| J[LogTransaction]; G -->|calls| K[EmailCustomer]; I -->|calls| L[UpdatePaymentStatus]; J -->|calls| M[ArchiveOrder];

How to build a Call Graph

Traverse the AST to identify function declarations and add them to the symbol table.
Create nodes for each function call found during the traversal.
Connect nodes with edges from callers to callees using the symbol table.
Recursively process each function to map out all calls and complete the graph.

Conclusion

Programming Language Processing in IDE - Updated

						%%{
							init: {
								'theme': 'base',
								'themeVariables': {
									'fontSize': '30px',
									'darkmode': true,
									'lineColor': '#F8B229'
								}
							}
						}%%
						graph LR
						A[Code] --> B[Lexical Analysis]
						B --> C[Tokens]
						C --> D[Syntax Analysis]
						D --> E[AST]
						E --> G[Semantic Analysis]
						G --> I[PSI]
						I --> K[Type System]
						I --> L[Annotated AST]
						I --> M[Symbol Tables]
						I --> N[Declared Elements]
						I --> O[Resolved References]
						I --> P[Control Flow Graphs]
						I --> R[Call Graph]
						I --> S[...]
						K --> Z[Static Analysis]
						L --> Z
						M --> Z
						N --> Z
						O --> Z
						P --> Z
						R --> Z
						S --> Z

What to learn about compilers?

Operational Semantics: The rules that govern the behavior of program statements during execution.
Stack Machines: A model for executing code that uses a stack for storing data.
Code Generation: The process of converting high-level instructions into machine code.
Intermediate Code: A neutral code format used for optimization before final code generation.
Memory Management: Techniques for efficient allocation and management of memory in programs.
Register Allocation: The strategy for using CPU registers for variable storage to enhance performance.
Code Optimization: Methods for improving code to run faster and use fewer resources.

What to read?

"Types and Programming Languages" by Benjamin C. Pierce
"Programming Language Pragmatics" by Michael L. Scott
"Software Verification and Validation: An Engineering and Scientific Approach" by Norman E. Fenton and James Neil

Next: Dynamic Analysis

The upcoming lecture will cover Dynamic Analysis, detailing its integral role in IDE development. We'll examine dynamic analysis tools, their application in real-time code evaluation, and their contribution to performance profiling.

Questions & Answers

Thank you for your attention!

I'm now open to any questions you might have.