Understanding
Programming Language
Part 2

IDE Development Course

Andrew Vasilyev

Today's Agenda

  • Introduction to Semantic Analysis
  • Understanding Symbols and Name Resolution
  • Exploring Scopes, Lifetimes, and Nesting
  • Delving into Symbol Tables
  • IDE-specific Usages of Symbol Tables
  • Challenges and Advanced Topics in Symbol Resolution

Semantic Analysis

Semantic Analysis

Ensuring the source code not only "sounds right" but also "makes sense" within the context of the language's rules.

Purpose of Semantic Analysis
  • Understands the meaning of syntactically correct code.
  • Determines valid meaning within the language's rules.
  • Operates on the abstract syntax tree (AST) or other intermediate representations.

Key Operations in Semantic Analysis

  • Name and Reference Resolution: Mapping symbolic names to their actual definitions or declarations.
  • Scope Rules Enforcement: Valid access of variables and functions.
  • Type Checking: Compatibility of data types in operations.
  • Variable Binding: Confirmation of variable declaration and initialization.
  • Function/Method Binding: Validity of function or method calls.

Errors Detected by Semantic Analysis

While syntax errors are about code structure, semantic errors delve into logical mistakes.

  • Using an undefined variable.
  • Out-of-bounds array access.
  • Incorrect function call arguments.

int result = "hello" + 5;

Syntactically correct but semantically problematic in many statically-typed languages.

Symbols and Name Resolution

Symbols and References

Symbols represent named entities in programming languages.

In the context of a symbol table, a symbol can refer to variables, functions, classes, modules, or any other identifiable construct in code. The symbol table stores information about each symbol, such as its location, type, and scope.

Reference of a symbol refers to a specific occurrence in the code where the value associated with that symbol is accessed or used, rather than where it's defined or declared.

Name Resolution (aka "Symbol Resolution" or simply "Resolve")

Symbol resolution, often referred to as "name resolution" or "binding", is the process by which a compiler, interpreter or IDE maps a symbol in the source code to its corresponding declaration or definition. This is essential for ensuring that the correct piece of data or functionality is accessed when a symbol is referenced.

Name Resolution

The IDE evaluates the scoping rules of the language to deduce potential locations where the symbol's definition might reside. This can range from the enclosing block for local variables to module or namespace levels for global ones.

If a symbol isn't found in the nearest scope, the IDE explores enclosing or parent scopes. For instance, in object-oriented languages, a base class will be inspected if a method is absent in a derived class.

For languages with support for function or operator overloading, the resolution process takes into account the symbol's usage context, like argument types and quantity, to determine the correct overload.

Name Resolution

For languages that bifurcate compilation and linking, resolution can be twofold. During compilation, symbols are checked within the current translation unit, while during linking, resolution spans across units.

Languages or systems that allow dynamic linking can have symbol resolution during runtime. This is typical when libraries or modules are dynamically loaded and linked.

Resolution Errors

If a symbol remains unresolved, the compiler or interpreter flags a resolution error, signaling that a declaration or definition for the symbol was not located.

IDE typically marks symbol as unresolved and continues the process taking in account that symbol does not have useful meaning.

Scopes, Lifetimes, and Nesting

Scopes

Scope determines the region or portion of the code where a symbol can be accessed or modified.

It defines the visibility and accessibility of a symbol. Scopes can be nested, meaning a scope can be contained within another scope. Typical scopes include local (inside a function or method), class (inside a class), and global (accessible throughout the code).

Lifetimes

Lifetime refers to the duration or period during program execution when a symbol exists in memory.

It starts when memory is allocated for the symbol and ends when the memory is deallocated. Lifetime is closely tied to scope in many programming languages. For instance, a variable with local scope usually has a lifetime that lasts as long as the function containing it is executing.

Nesting

Nesting in programming denotes the encapsulation of one scope or structure within another.

This hierarchical organization, evident in constructs like nested loops or functions within functions, facilitates a layered access to symbols. Through nesting, inner scopes can reference outer scope symbols while maintaining the capability to shadow or override them, ensuring clarity and precision in symbol utilization.

Symbol Qualifiers

A symbol qualifier, often simply called a "qualifier", is a prefix to a symbol that provides a more specific context to that symbol.

Examples: namespace, package or module qualifiers, class or object qualifiers, etc.

It's used to disambiguate symbols that might have the same name but reside in different scopes or namespaces.

Example

					
						// Global scope
						int globalVariable = 100;  // Has a global scope and lifetime extending the entire execution of the program
						
						class Program
						{
							// Class-level scope
							static int classVariable = 50;  // Has a class-level scope and lifetime as long as the containing class is in memory
						
							static void Main(string[] args)
							{
								// Method-level scope
								int localVariable = 10;  // Has a method-level scope and lifetime limited to the execution of the Main method
								
								Console.WriteLine(globalVariable);  // Accessible because it's in a global scope
																	// "Console." is a class qualifier.
								Console.WriteLine(classVariable);   // Accessible because it's in the class scope
								
								if (true)
								{
									// Block-level scope
									int blockVariable = 5;  // Has a block-level scope and lifetime limited to this if-block
									
									Console.WriteLine(localVariable);   // Accessible because the block is inside the method
									Console.WriteLine(blockVariable);   // Accessible within its own block
								}
								
								// Console.WriteLine(blockVariable);  // This would be an error. blockVariable is out of scope here.
							}
						
							static void AnotherMethod()
							{
								// Console.WriteLine(localVariable);  // This would be an error. localVariable is out of scope since it's local to Main.
								Console.WriteLine(classVariable);   // Accessible because it's in the class scope
							}
						}
					
				

Scope Resolution

Scope resolution refers to the process by which a compiler or interpreter determines which variable, function, or other symbol a reference in the code pertains to, especially when there are multiple symbols with the same name in different scopes. The process involves checking various scopes in a hierarchical manner until the appropriate symbol is found or until it's determined that the symbol does not exist in any accessible scope.

Symbol Tables

Symbol Tables

A symbol table is a data structure used by compilers, interpreters, and IDEs to store information about symbols (such as variable names, function names, class names, etc.) used in the source code of a program. It plays a crucial role in the process of name binding, wherein symbolic names in the source code are associated (or "bound") with specific memory addresses, data types, or functionalities in the compiled or interpreted code.

Role of Symbol Tables

  • Name Binding
  • Data Storage
  • Scoping
  • Error Checking
							
								struct SymbolTableEntry {
									string name;
									string type;
									int scopeLevel;
									// ... other attributes ...
								};
							
						

Role of Symbol Tables

Associating symbolic names in the source code with specific attributes such as memory addresses, data types, or functionalities in the compiled or interpreted code.

Managing different scopes in a program to ensure names are resolved in the correct context. This includes local vs. global scopes and nested scopes.

Using symbol tables, compilers can detect errors such as use of undeclared variables, type mismatches, and redeclaration of variables.

Role of Symbol Tables

Storing attributes for each symbol:

Type (int, float, etc.)
Scope (local, global)
References to PSI trees and` declared elements
Any other relevant data

Nested or Chained Symbol Tables

Reflecting the hierarchical structure of the source code, especially in languages that support nested scopes or classes.

								%%{
									init: {
										'theme': 'base',
										'themeVariables': {
											'fontSize': '24px',
											'darkmode': true,
											'lineColor': '#F8B229'
										}
									}
								}%%
								graph TB
								Global(Global Scope) --> Table1[Symbol Table: Global]
								Table1 --> Local1[Namespace Scope] --> Table2[Symbol Table: Namespace]
								Table2 --> Local2[Class Scope 1] --> Table3[Symbol Table: Class1]
								Table2 --> Local2[Class Scope 2] --> Table4[Symbol Table: Class2]
								Table3 --> Local3[Method Scope 1] --> Table5[Symbol Table: Method1]
								Table3 --> Local4[Method Scope 2] --> Table6[Symbol Table: Method2]
								Table4 --> Local5[Method Scope 3] --> Table7[Symbol Table: Method3]
						
								style Global fill:#f9d457,stroke:#f08c00
								style Local1 fill:#bfe5bf,stroke:#5cd65c
								style Local2 fill:#ffa07a,stroke:#ff4500
								style Local3 fill:#dda0dd,stroke:#da70d6
							

Common Implementation

Hash Tables: Fast lookups, ideal for unordered data.
Trees: Hierarchical structure, suited for nested scopes.
Linked Lists: Sequential access, simpler but slower.

Spaghetti Stack

Facilitates the construction of a symbol table and scope hierarchy during AST traversal.

push - Introduce a new child node and designate it as the top of the stack.

pop - Revert to the parent node as the new top of the stack.

Spaghetti Stack

During traversal, the compiler encounters constructs like loops, conditionals, and functions. Each potentially introduces a new scope.

As nested constructs are encountered, the spaghetti stack manages these scopes. New scopes link to parent scopes using the stack.

The stack maintains a hierarchy. The top represents the current scope, while deeper elements represent outer scopes. Symbols are added to the current scope's table as encountered.

For symbol references, the compiler starts at the current scope and navigates the stack's hierarchy until the symbol is found or deemed undefined.

IDE-specific usages and challenges

How IDEs Utilize Symbol Tables

Error Checking

Identify unresolved symbols or symbols used out of their intended context.

Type Checking and Inference

Perform type checks, suggest type-specific methods, and infer types using symbol table data.

Syntax and Semantic Highlighting

Differentiate symbols using colors and styles based on their nature and context.

How IDEs Utilize Symbol Tables

Code Navigation

Go to Symbol: Jump to the source of a symbol's declaration using part of the name.
Go to Definition: Jump to the source of a symbol's declaration.
Find References/Usages: List all references of a symbol in the project.

Auto-completion

Offer relevant symbol suggestions based on context for faster and error-free coding.

Refactorings

Consistently rename symbols throughout the project using the symbol table.

How IDEs Utilize Symbol Tables
Code Documentation and Tooltips

Display symbol metadata, like documentation or type information, as tooltips.

Scope and Lifetime Visualization

Highlight or indicate the scope and lifetime of variables based on symbol table structure.

Debugging

Resolve variable names, display call stacks, and evaluate expressions during debugging sessions.

Code Generation

Generate code templates like "implement interface" or "override method" using the symbol table.

Challenges and Advanced Topics

Incremental Updates

Maintain the spaghetti stack and replace subtrees according to updates to ensure the symbol table remains consistent with the code.

Third-Party Libraries and Dependencies

Handling symbols from external sources can be tricky. IDEs need to integrate symbols from third-party libraries and dependencies without overwhelming the developer or mixing them up with the project's own symbols. This often involves creating separate symbol tables or namespaces and ensuring they're correctly linked and accessible.

Overloading and Templates

Function overloading and templates introduce complexities as a single symbol name can represent multiple entities. The IDE needs to correctly identify and differentiate each overloaded function or template instantiation, ensuring type safety and correct navigation.

Resolving Symbols in Dynamic Languages

Dynamic languages, like Python or JavaScript, pose a unique challenge because types and symbols can change at runtime. IDEs use heuristic methods, type inference, and sometimes even runtime information to make educated guesses about symbol resolution in these contexts.

Next: Understanding Programming Language Part 3

In our next session, we'll explore type systems and their role in programming. We'll also dive into semantic analysis, which helps interpret the meaning of code, and touch on type inference, which lets languages deduce types on their own. Join us for a deeper look into these essential topics.

Questions & Answers

Thank you for your attention!

I'm now open to any questions you might have.