In this phase, you will build upon your lexical analyzer and parser to create a semantic analyzer that performs contextual checks on the Abstract Syntax Tree (AST). Semantic analysis ensures that the program is not just syntactically correct, but semantically meaningful in accordance to the language rules you have defined. This time, no template code will be provided (as we want you to build on your own lexer and parser), however, the documentation will provide you with the guidance and conceptual starter kit to start working on your semantic analyzer.
Semantic analysis is the compiler phase that:
- Checks if the constructs in the program are semantically valid (e.g., ensuring that a variable is declared before it's used: x = 5; would be invalid if x was never declared)
- Enforces language rules beyond syntax (type checking, scope rules, etc.)
- Detects logical errors that can't be caught by the parser (like type mismatches in assignments, accessing variables outside their scope, or using uninitialized variables)
- Builds a symbol table to track variables, their types, and scopes
While a parser ensures code follows the language grammar, it can't verify logical correctness. For example, a parser can verify that x = y; follows correct syntax, but it can't determine:
- Whether
xandyhave been declared - If the types of
xandyare compatible - If
xandyare being used within their scope
phase3/
├── include/
│ ├── tokens.h # Token definitions from Phase 1
│ ├── lexer.h # Lexer interface
│ ├── parser.h # Parser definitions from Phase 2
│ └── semantic.h # Semantic analyzer definitions
├── src/
│ ├── lexer/
│ │ └── lexer.c # Lexer implementation from Phase 1
│ ├── parser/
│ │ └── parser.c # Parser implementation from Phase 2
│ └── semantic/
│ └── semantic.c # Semantic analyzer implementation
└── test/
├── input_valid.txt
├── input_invalid.txt
└── input_semantic_error.txt
Your existing implementation includes:
- Lexical analyzer to generate tokens
- Parser to create an AST
- Error reporting framework
The symbol table is the core data structure for tracking variables and their properties. This table acts as a dictionary that stores information about each identifier (variable, function, etc.) in your program. For each symbol, it maintains essential information like:
- name of the identifier
- type information
- scope information (where the variable is accessible)
- initilization status
// Basic symbol structure
typedef struct Symbol {
char name[100]; // Variable name
int type; // Data type (int, etc.)
int scope_level; // Scope nesting level
int line_declared; // Line where declared
int is_initialized; // Has been assigned a value?
struct Symbol* next; // For linked list implementation
} Symbol;
// Symbol table
typedef struct {
Symbol* head; // First symbol in the table
int current_scope; // Current scope level
} SymbolTable;Create operations for managing the symbol table:
// Initialize a new symbol table
// Creates an empty symbol table structure with scope level set to 0
SymbolTable* init_symbol_table();
// Add a symbol to the table
// Inserts a new variable with given name, type, and line number into the current scope
void add_symbol(SymbolTable* table, const char* name, int type, int line);
// Look up a symbol in the table
// Searches for a variable by name across all accessible scopes
// Returns the symbol if found, NULL otherwise
Symbol* lookup_symbol(SymbolTable* table, const char* name);
// Enter a new scope level
// Increments the current scope level when entering a block (e.g., if, while)
void enter_scope(SymbolTable* table);
// Exit the current scope
// Decrements the current scope level when leaving a block
// Optionally removes symbols that are no longer in scope
void exit_scope(SymbolTable* table);
// Remove symbols from the current scope
// Cleans up symbols that are no longer accessible after leaving a scope
void remove_symbols_in_current_scope(SymbolTable* table);
// Free the symbol table memory
// Releases all allocated memory when the symbol table is no longer needed
void free_symbol_table(SymbolTable* table);The main semantic checking functions:
// Main semantic analysis function
int analyze_semantics(ASTNode* ast);
// Check a variable declaration
int check_declaration(ASTNode* node, SymbolTable* table);
// Check a variable assignment
int check_assignment(ASTNode* node, SymbolTable* table);
// Check an expression for type correctness
int check_expression(ASTNode* node, SymbolTable* table);
// Check a block of statements, handling scope
int check_block(ASTNode* node, SymbolTable* table);
// Check a condition (e.g., in if statements)
int check_condition(ASTNode* node, SymbolTable* table);Define semantic error types and reporting functions:
typedef enum {
SEM_ERROR_NONE,
SEM_ERROR_UNDECLARED_VARIABLE,
SEM_ERROR_REDECLARED_VARIABLE,
SEM_ERROR_TYPE_MISMATCH,
SEM_ERROR_UNINITIALIZED_VARIABLE,
SEM_ERROR_INVALID_OPERATION,
SEM_ERROR_SEMANTIC_ERROR // Generic semantic error
} SemanticErrorType;
// Report semantic errors
void semantic_error(SemanticErrorType error, const char* name, int line);- Implement the symbol table data structure
- Create functions to add, look up, and remove symbols
- Manage scopes (enter/exit)
- Track variable initialization status
- Check variable declarations (no redeclaration in same scope)
- Check variable usage (must be declared before use)
- Track initialization status (warn if used uninitialized)
- Ensure operands have compatible types
- Validate operations based on operand types
- Add implicit type conversions where appropriate
- Implement block scoping rules
- Track variables across nested scopes
- Handle variable shadowing correctly
- Validate conditions in if/while statements
- Check that control statements have valid bodies
- Validate factorial function calls
- Ensure special operations have correct argument types
// Initialize symbol table
SymbolTable* init_symbol_table() {
SymbolTable* table = malloc(sizeof(SymbolTable));
if (table) {
table->head = NULL;
table->current_scope = 0;
}
return table;
}
// Add symbol to table
void add_symbol(SymbolTable* table, const char* name, int type, int line) {
Symbol* symbol = malloc(sizeof(Symbol));
if (symbol) {
strcpy(symbol->name, name);
symbol->type = type;
symbol->scope_level = table->current_scope;
symbol->line_declared = line;
symbol->is_initialized = 0;
// Add to beginning of list
symbol->next = table->head;
table->head = symbol;
}
}
// Look up symbol by name
Symbol* lookup_symbol(SymbolTable* table, const char* name) {
Symbol* current = table->head;
while (current) {
if (strcmp(current->name, name) == 0) {
return current;
}
current = current->next;
}
return NULL;
}
// Look up symbol in current scope only
Symbol* lookup_symbol_current_scope(SymbolTable* table, const char* name) {
Symbol* current = table->head;
while (current) {
if (strcmp(current->name, name) == 0 &&
current->scope_level == table->current_scope) {
return current;
}
current = current->next;
}
return NULL;
}// Analyze AST semantically
int analyze_semantics(ASTNode* ast) {
SymbolTable* table = init_symbol_table();
int result = check_program(ast, table);
free_symbol_table(table);
return result;
}
// Check program node
int check_program(ASTNode* node, SymbolTable* table) {
if (!node) return 1;
int result = 1;
if (node->type == AST_PROGRAM) {
// Check left child (statement)
if (node->left) {
result = check_statement(node->left, table) && result;
}
// Check right child (rest of program)
if (node->right) {
result = check_program(node->right, table) && result;
}
}
return result;
}
// Check declaration node
int check_declaration(ASTNode* node, SymbolTable* table) {
if (node->type != AST_VARDECL) {
return 0;
}
const char* name = node->token.lexeme;
// Check if variable already declared in current scope
Symbol* existing = lookup_symbol_current_scope(table, name);
if (existing) {
semantic_error(SEM_ERROR_REDECLARED_VARIABLE, name, node->token.line);
return 0;
}
// Add to symbol table
add_symbol(table, name, TOKEN_INT, node->token.line);
return 1;
}
// Check assignment node
int check_assignment(ASTNode* node, SymbolTable* table) {
if (node->type != AST_ASSIGN || !node->left || !node->right) {
return 0;
}
const char* name = node->left->token.lexeme;
// Check if variable exists
Symbol* symbol = lookup_symbol(table, name);
if (!symbol) {
semantic_error(SEM_ERROR_UNDECLARED_VARIABLE, name, node->token.line);
return 0;
}
// Check expression
int expr_valid = check_expression(node->right, table);
// Mark as initialized
if (expr_valid) {
symbol->is_initialized = 1;
}
return expr_valid;
}void semantic_error(SemanticErrorType error, const char* name, int line) {
printf("Semantic Error at line %d: ", line);
switch (error) {
case SEM_ERROR_UNDECLARED_VARIABLE:
printf("Undeclared variable '%s'\n", name);
break;
case SEM_ERROR_REDECLARED_VARIABLE:
printf("Variable '%s' already declared in this scope\n", name);
break;
case SEM_ERROR_TYPE_MISMATCH:
printf("Type mismatch involving '%s'\n", name);
break;
case SEM_ERROR_UNINITIALIZED_VARIABLE:
printf("Variable '%s' may be used uninitialized\n", name);
break;
case SEM_ERROR_INVALID_OPERATION:
printf("Invalid operation involving '%s'\n", name);
break;
default:
printf("Unknown semantic error with '%s'\n", name);
}
}int x; // Declaration
int y; // Another declaration
x = 5; // Initialization
y = x + 2; // Using initialized variable
if (x > 0) {
int z; // Declaration in nested scope
z = x + y; // Using outer scope variables
print z;
}// Undeclared variable
x = 5; // Error: x not declared
// Redeclaration in same scope
int x;
int x; // Error: x redeclared
// Uninitialized variable
int x;
int y;
y = x + 5; // Warning: x used before initialization
// Invalid operation
int x;
x = 5 / 0; // Error: division by zero
// Scope errors
if (x > 0) {
int y;
}
y = 5; // Error: y not accessible in this scopeint main() {
const char* input = "int x;\n"
"x = 42;\n"
"if (x > 0) {\n"
" int y;\n"
" y = x + 10;\n"
" print y;\n"
"}\n";
printf("Analyzing input:\n%s\n\n", input);
// Lexical analysis and parsing
parser_init(input);
ASTNode* ast = parse();
printf("AST created. Performing semantic analysis...\n\n");
// Semantic analysis
int result = analyze_semantics(ast);
if (result) {
printf("Semantic analysis successful. No errors found.\n");
} else {
printf("Semantic analysis failed. Errors detected.\n");
}
// Clean up
free_ast(ast);
return 0;
}A well-implemented semantic analyzer should produce:
Analyzing input:
int x;
x = 42;
if (x > 0) {
int y;
y = x + 10;
print y;
}
AST created. Performing semantic analysis...
Symbol Table Contents:
Name: y, Type: int, Scope: 1, Line: 4, Initialized: Yes
Name: x, Type: int, Scope: 0, Line: 1, Initialized: Yes
Semantic analysis successful. No errors found.
Analyzing input:
x = 42;
if (x > 0) {
int y;
y = z + 10;
print y;
}
AST created. Performing semantic analysis...
Semantic Error at line 1: Undeclared variable 'x'
Semantic Error at line 4: Undeclared variable 'z'
Semantic analysis failed. Errors detected.
== SYMBOL TABLE DUMP ==
Total symbols: 2
Symbol[0]:
Name: x
Type: int
Scope Level: 0
Line Declared: 1
Initialized: Yes
Symbol[1]:
Name: y
Type: int
Scope Level: 1
Line Declared: 4
Initialized: Yes
===================
-
Handling nested scopes
- Keep track of scope levels
- Implement scope entry/exit correctly
- Remember that inner scopes can access outer scope variables!!!
-
Type checking expressions
- Recursively check types in complex expressions
- Implement type compatibility rules
- Ensure type consistency across operations
-
Managing memory
- Do not forget to clean up the symbol table properly
- FREE all allocated symbols when done
-
Managing initialization status
- Track whether variables have been assigned values
- Warn about potentially uninitialized variables
- Update initialization status across assignments
- Begin with a simple implementation that handles basic variable checks
- Test incrementally with small programs before tackling complex ones
- Add features one at a time, testing thoroughly after each addition
- Create helper functions for common semantic checks
- Use consistent naming conventions for your semantic functions
- Document your symbol table design and semantic rules
- Complete implementation of your
semantic.candsemantic.h - Test cases demonstrating your semantic analyzer's capabilities
- Documentation of your:
- Symbol table implementation
- Semantic checking rules
- Error handling approach
- Any extensions or enhancements
Good luck with your semantic analyzer implementation! 🚀