Skip to content

Latest commit

 

History

History
2774 lines (1844 loc) · 61.2 KB

File metadata and controls

2774 lines (1844 loc) · 61.2 KB

NotPlusPlus Design Document

1. Document Purpose

This document is the authoritative design for NotPlusPlus, a source-code interpreter for a small, explicit, well-defined subset of real C++.

NotPlusPlus is not a new language with C++-like syntax. It is a system that accepts actual C++ source text and interprets programs whose constructs fall entirely within a supported subset of the C++ language. Programs outside that subset are rejected with precise diagnostics.

This document defines:

  • the product goal
  • the supported language subset
  • semantic rules
  • architecture
  • internal representations
  • execution model
  • diagnostic behavior
  • implementation plan
  • testing strategy
  • explicit non-goals
  • future evolution constraints

2. Product Definition

2.1 Product name

NotPlusPlus

2.2 Core objective

NotPlusPlus shall:

  1. accept source code written in genuine C++ syntax
  2. lex and parse it according to a subset-compatible grammar
  3. resolve declarations and names according to subset-compatible C++ rules
  4. type-check the program according to the subset’s semantics
  5. interpret the program directly without producing native machine code
  6. execute a well-formed main function and produce observable program output

2.3 Product positioning

NotPlusPlus is a subset C++ interpreter, not a compiler, transpiler, static analyzer, or language invention exercise.

The correct mental model is:

“Interpret actual C++ source that belongs to a strict, documented subset of ISO C++.”

This distinction is critical. The parser and semantic rules must align with real C++ wherever the subset overlaps the language, rather than inventing alternate rules for convenience.

2.4 Primary use case

A user writes a small C++ program using only supported constructs, for example:

int add(int a, int b) {
    return a + b;
}

int main() {
    int x = add(2, 3);
    if (x > 4) {
        print(x);
    }
    return 0;
}

NotPlusPlus parses and interprets this program according to the documented subset semantics.

2.5 Non-goal framing

NotPlusPlus shall not attempt to “mostly parse C++” or “best-effort emulate unsupported constructs.” Unsupported features are not partially recognized and ignored. They are rejected.

This project succeeds by being:

  • strict
  • explicit
  • deterministic
  • semantically coherent
  • faithful where supported

3. Design Principles

3.1 Real C++ first

If a construct is supported, its spelling and semantics must correspond to real C++ as closely as practical within the subset.

3.2 Explicit subset contract

Every accepted construct must be documented. Everything else is unsupported.

3.3 No silent fallback semantics

Unsupported constructs shall not be reinterpreted under custom rules.

Bad:

  • accepting std::cout << x; and secretly treating it as print(x);

Good:

  • either support real semantics for a narrow form of expression involving std::cout, or reject it

For version 1, the recommended design is to reject std::cout entirely and provide a built-in function print(...) defined as part of the interpreter’s runtime environment, because a normal function call is valid C++ syntax.

3.4 Parseability and semantic tractability

Subset selection must deliberately avoid notorious C++ ambiguities and front-end complexity where possible.

3.5 Deterministic behavior

Given the same source and interpreter build, behavior must be stable and reproducible.

3.6 Diagnostics as part of the product

A rejected program must receive actionable diagnostics with source locations and stable error categories.

3.7 Layered pipeline

The implementation shall be staged:

  1. source management
  2. lexing
  3. preprocessing boundary handling
  4. parsing
  5. AST formation
  6. semantic analysis
  7. interpretation

No phase may embed undocumented behavior from a later phase unless explicitly designed.


4. Scope Summary

4.1 Supported in version 1

The initial supported subset shall include:

translation unit structure

  • a single translation unit
  • zero or more function definitions
  • optional function declarations
  • no separate compilation
  • no headers beyond a limited interpreter-provided prelude model

types

  • int
  • bool
  • void
  • fixed-size one-dimensional arrays of supported element type, optionally deferred to phase 2

expressions

  • integer literals
  • boolean literals true and false
  • identifier references
  • parenthesized expressions
  • unary operators: +, -, !
  • binary arithmetic: +, -, *, /, %
  • binary comparison: <, <=, >, >=, ==, !=
  • logical operators: &&, ||
  • assignment: =
  • compound assignment: +=, -=, *=, /=, %=
  • function call
  • array indexing if arrays are enabled
  • comma operator is unsupported in expressions except where grammar requires comma separators

statements

  • expression statement
  • declaration statement
  • compound statement / block
  • if
  • if / else
  • while
  • for
  • return
  • break
  • continue

declarations

  • local variable declarations
  • function declarations and definitions
  • block scope
  • function parameter declarations
  • optional local array declarations

runtime/library surface

  • built-in function print(int)
  • built-in function print(bool) optional
  • built-in function println(int) optional
  • built-ins shall be ordinary function names in the global namespace from the program's perspective

execution

  • interpret starting from int main()
  • allow int main() and possibly int main(int, bool) only if intentionally designed; default is only int main()

4.2 Excluded from version 1

  • preprocessing beyond minimal policy handling
  • macros
  • includes with real header loading
  • namespaces
  • classes
  • structs
  • enums
  • references
  • pointers
  • dynamic allocation
  • strings
  • floating point
  • character types
  • casts
  • overload resolution beyond built-in support model
  • templates
  • exceptions
  • function pointers
  • lambdas
  • recursion limits beyond implementation-defined stack protection
  • user-defined operators
  • declarations requiring full declarator complexity
  • switch
  • do-while
  • const, constexpr, static, extern, volatile, mutable
  • global variables, at least in version 1 baseline

5. Supported Language Definition

This section is normative.

5.1 Source model

The input to NotPlusPlus is a UTF-8 text file treated as a single C++ source file.

The implementation may restrict accepted characters to ASCII plus standard whitespace for version 1.

Line endings:

  • \n mandatory support
  • \r\n normalized support recommended

5.2 Translation units

A program consists of a sequence of top-level declarations. In version 1:

Allowed top-level declarations:

  • function declaration
  • function definition
  • optional built-in declaration injection, performed by interpreter before semantic analysis

Disallowed at top level:

  • variable definitions
  • namespace declarations
  • type definitions
  • using directives
  • class/struct definitions
  • templates
  • include directives unless a special preprocessing policy is adopted

5.3 Keywords

The supported keyword set includes:

  • int
  • bool
  • void
  • if
  • else
  • while
  • for
  • return
  • true
  • false

All other C++ keywords are lexed as keywords if the lexer supports them globally, but any occurrence in syntax outside the supported grammar is rejected as unsupported.

5.4 Comments

Support:

  • // line comment
  • /* block comment */

Nested block comments are not supported, matching C/C++ behavior.

5.5 Literals

Supported literals:

  • decimal integer literals, non-suffixed
  • true
  • false

Unsupported:

  • hexadecimal
  • binary
  • octal
  • digit separators
  • integer suffixes
  • character literals
  • string literals
  • floating literals
  • user-defined literals

The integer literal domain shall be bounded by the interpreter integer representation. Recommended baseline: signed 32-bit two’s-complement semantics.

5.6 Types

5.6.1 Fundamental types

Supported:

  • int
  • bool
  • void

5.6.2 Arrays

Optional in version 1, but strongly recommended only after core stability.

Supported form:

  • int a[5];
  • bool flags[10];

Constraints:

  • one-dimensional only
  • size must be a positive integer literal
  • no variable-length arrays
  • no array parameters by special adjustment unless explicitly modeled
  • no decay to pointer semantics
  • no initializer lists in version 1 baseline

5.6.3 Type system rules

  • void may only appear as a function return type
  • variables and parameters may not have type void
  • arrays may not have element type void
  • if arrays are supported, assignment between arrays is disallowed

5.7 Declarators

To avoid full C++ declarator complexity, the subset supports only a reduced set of declarator forms.

5.7.1 Variable declarators

Allowed:

  • int x;
  • int x = 5;
  • bool done = false;
  • int arr[5]; if arrays enabled

Disallowed:

  • multiple declarators in one declaration, e.g. int a, b;
  • pointer declarators
  • reference declarators
  • parenthesized declarators
  • initialized arrays
  • declarators with qualifiers

5.7.2 Function declarators

Allowed:

  • int f(int a, int b);
  • int f(int a, int b) { ... }
  • void g();
  • bool h(bool x) { ... }

Disallowed:

  • default parameters
  • variadic parameters
  • function overloading
  • member functions
  • trailing return types
  • noexcept, attributes, requires clauses
  • cv/ref qualifiers
  • templates

5.8 Statements

5.8.1 Compound statement

Supported:

{
    int x = 1;
    x = x + 1;
}

Each block creates a new lexical scope.

5.8.2 Declaration statement

A declaration statement is a supported local variable declaration followed by ;.

5.8.3 Expression statement

Any supported expression followed by ;.

5.8.4 If statement

Supported:

if (cond) stmt
if (cond) stmt else stmt

Condition must be of type bool, or int if integer-to-bool contextual conversion is allowed by policy. Recommended baseline: allow both, matching C++ contextual conversion to bool.

5.8.5 While statement

Supported:

while (cond) stmt

5.8.6 For statement

Supported:

for (init; cond; step) stmt

Version 1 recommended support:

  • init may be empty, an expression statement without trailing semicolon inside the syntax, or a single supported variable declaration
  • cond may be empty or a supported expression
  • step may be empty or a supported expression

Examples:

for (int i = 0; i < 10; i = i + 1) { ... }
for (; x < 10; ) { ... }

5.8.7 Return statement

Supported:

return;
return expr;

Rules:

  • return; only valid in void functions
  • return expr; required for non-void functions
  • expression type must be convertible to function return type according to subset rules

5.9 Expressions

5.9.1 Primary expressions

Supported:

  • identifier
  • integer literal
  • true
  • false
  • parenthesized expression
  • function call
  • array subscript if arrays enabled

5.9.2 Unary expressions

Supported:

  • +expr
  • -expr
  • !expr

Unsupported:

  • ++
  • --
  • *
  • &
  • sizeof
  • new
  • delete
  • static_cast
  • C-style cast
  • ~

5.9.3 Binary expressions

Supported:

  • multiplicative: * / %
  • additive: + -
  • relational: < <= > >=
  • equality: == !=
  • logical and: &&
  • logical or: ||
  • assignment: =
  • compound assignment: += -= *= /= %=

Unsupported:

  • bitwise operators
  • shifts
  • comma operator
  • member access
  • pointer-to-member
  • spaceship operator

5.9.4 Assignment

Assignment is supported only for assignable lvalues:

  • variable reference
  • array element if arrays enabled

Unsupported:

  • chained assignment if parser naturally accepts it via right associativity is allowed only if semantic rules support it; recommended baseline: support it because it follows normal assignment-expression grammar, but it is not required as a primary advertised feature

5.9.5 Function call

Calls to declared functions are supported.

Rules:

  • exact arity match required
  • argument types must be compatible
  • no overload resolution
  • no implicit function declarations

5.9.6 Short-circuit behavior

&& and || must short-circuit exactly as in C++.

This is semantically important and non-negotiable.


6. Semantic Rules

This section defines the runtime-visible and compile-time-visible semantics of the supported subset.

6.1 Name lookup and scopes

6.1.1 Scope kinds

The interpreter shall model at least:

  • global function scope
  • function parameter scope
  • block scope
  • for-init scope if declaration form is used

6.1.2 Variable lookup

Variables are resolved lexically, innermost scope first.

6.1.3 Shadowing

Shadowing is allowed across nested scopes.

Example:

int main() {
    int x = 1;
    {
        int x = 2;
        print(x); // 2
    }
    print(x); // 1
    return 0;
}

6.1.4 Redeclaration

Two variables with the same name in the same scope are rejected.

Function declarations may be repeated only if identical in signature and kind. Because overloading is unsupported, any differing function signature with the same name is an error.

6.2 Type system

6.2.1 Static typing

The subset is statically typed. Types are determined during semantic analysis.

There is no dynamic typing or value-tag-based operator selection beyond what is already statically determined.

6.2.2 Supported implicit conversions

A policy choice is required here. The recommended baseline is:

Allowed:

  • int to bool in contextual conversions only, such as conditions and logical operators where C++ would require a bool-like condition
  • bool to int for arithmetic contexts only if explicitly aligned with C++ integral promotion semantics

However, to simplify implementation while staying faithful enough, version 1 should adopt:

Recommended v1 rule set

  • int and bool are distinct types
  • arithmetic operators require int
  • comparison operators on int produce bool
  • equality operators support int == int and bool == bool
  • logical operators require operands contextually convertible to bool
  • conditions (if, while, for) require expression contextually convertible to bool
  • assignment requires exact type match, except possibly bool = int and int = bool if a limited conversion matrix is adopted

For design clarity, the strictest consistent version is preferred:

Strict baseline

  • exact-type assignment only
  • contextual bool conversion allowed from bool and int
  • no other implicit conversions

This gives useful C++ fidelity without opening a large conversion lattice.

6.3 Initialization

6.3.1 Local variables

Supported:

  • default initialization without initializer
  • copy initialization with = expr

For simplicity and defined behavior, the interpreter should not mimic uninitialized local scalar UB in version 1. Instead choose one of:

  1. strict C++-style UB model for uninitialized reads, detected dynamically
  2. explicit interpreter rule: reading an uninitialized variable is a runtime error

Recommended:

  • every variable has an initialized flag
  • declarations without initializer create uninitialized storage
  • reading before initialization is a runtime error with source location

This is closer to a practical interpreter and still semantically honest.

6.3.2 Arrays

If supported:

  • elements default to uninitialized
  • element read before write is a runtime error
  • zero-initialization syntax is unsupported in version 1

6.4 Operator semantics

6.4.1 Integer arithmetic

Operations are performed on interpreter integers. Recommended baseline semantics:

  • 32-bit signed range
  • overflow is a runtime error, or implementation-defined wraparound

This needs a deliberate choice because real signed overflow in C++ is UB.

Recommended v1 choice:

  • detect overflow and raise runtime error

Rationale:

  • deterministic
  • easier to debug
  • safer
  • acceptable for interpreter-defined handling of UB-like conditions

Document this explicitly:

NotPlusPlus does not reproduce all undefined behavior of full C++. Certain UB-prone operations are trapped deterministically at runtime.

This is acceptable because the subset is “real C++ syntax and semantics” only within a constrained executable model; UB emulation is not required.

6.4.2 Division and modulo

Division by zero and modulo by zero are runtime errors.

6.4.3 Logical operators

Operands are contextually converted to bool. Evaluation short-circuits.

6.4.4 Comparison

  • int relational comparison is supported
  • bool relational comparison is unsupported unless explicitly added; recommended baseline: disallow except equality
  • equality on same-type operands is supported

6.4.5 Assignment

Assignment evaluates RHS, converts if allowed, stores value, and yields the assigned value if assignment expressions are expressions in the grammar.

6.5 Control flow semantics

6.5.1 If

Condition evaluated once. Then branch chosen accordingly.

6.5.2 While

Standard loop semantics.

6.5.3 For

Equivalent to C++ subset semantics, not a purely internal custom loop type. The interpreter may implement by direct execution or desugaring.

If desugared, it must preserve:

  • init scope
  • condition evaluation timing
  • step evaluation timing
  • block scoping behavior

6.5.4 Return

A return transfers control immediately to the caller.

Reaching end of function:

  • for void function: allowed, implicit return
  • for non-void function other than main: semantic error or runtime error

Recommended:

  • semantic analysis requires that a non-void function contains at least one syntactically reachable return on all control paths only if control-flow analysis is implemented
  • otherwise, reaching end of a non-void function at runtime is a runtime error

For v1, do both:

  • conservative static check when trivially obvious
  • definitive runtime check at function end

For main, reaching end may return 0 in full C++, but to keep rules simple:

  • require explicit return 0; in version 1, or
  • allow implicit return 0; for main

Recommended baseline:

  • allow implicit return 0 at end of int main()

6.6 Functions

6.6.1 Declaration and definition

Functions may be declared and later defined, or directly defined.

6.6.2 Call semantics

Arguments are evaluated left-to-right as a deliberate subset policy. Full C++ has historically complex sequencing rules. Since this project is an interpreter for a subset, choose a fixed order and document it.

Recommended:

  • evaluate function arguments left-to-right

This is slightly stricter than some historical C++ behavior, but deterministic and implementable.

6.6.3 Recursion

Direct and indirect recursion are supported unless explicitly disabled. Recommended: support recursion.

Implementation shall provide:

  • configurable max call depth
  • runtime error on stack depth exhaustion

6.6.4 Built-in functions

Built-ins are represented as ordinary callable global functions with interpreter-native implementations.

Version 1 required:

  • void print(int)
  • optional void print(bool)

Because overload resolution is unsupported, there are two implementation options:

Option A

Single polymorphic built-in outside the user function model. Simpler runtime, less C++-faithful.

Option B

Allow limited built-in overloads only, while forbidding user-defined overloads.

Recommended:

  • built-ins may have a small internal overload set
  • user-defined overloads remain unsupported

This should be explicitly documented as a runtime privilege, not general language support.


7. Syntax and Grammar Specification

The parser need not implement full ISO grammar. It shall implement a reduced grammar that accepts exactly the subset.

The grammar below is normative at the subset level, though implementation may refactor it.

7.1 Lexical tokens

identifiers

identifier ::= letter (letter | digit | "_")*

integer literal

int_literal ::= digit+

keywords

"int" "bool" "void" "if" "else" "while" "for" "return" "true" "false"

punctuators/operators

"(" ")" "{" "}" "[" "]" ";" "," "="
"+" "-" "*" "/" "%" "!" "&&" "||"
"==" "!=" "<" "<=" ">" ">="

7.2 Grammar

translation_unit
    ::= top_level_decl*

top_level_decl
    ::= function_decl
     | function_def

function_decl
    ::= type identifier "(" parameter_list_opt ")" ";"

function_def
    ::= type identifier "(" parameter_list_opt ")" compound_stmt

parameter_list_opt
    ::= /* empty */
     | parameter_list

parameter_list
    ::= parameter ("," parameter)*

parameter
    ::= type identifier
     | type identifier "[" int_literal "]"   // only if array params are supported, otherwise omit

type
    ::= "int"
     | "bool"
     | "void"

compound_stmt
    ::= "{" stmt* "}"

stmt
    ::= compound_stmt
     | decl_stmt
     | expr_stmt
     | if_stmt
     | while_stmt
     | for_stmt
     | return_stmt

decl_stmt
    ::= local_var_decl ";"

local_var_decl
    ::= type identifier
     | type identifier "=" expr
     | type identifier "[" int_literal "]"    // if arrays enabled

expr_stmt
    ::= expr_opt ";"

expr_opt
    ::= /* empty */
     | expr

if_stmt
    ::= "if" "(" expr ")" stmt ("else" stmt)?

while_stmt
    ::= "while" "(" expr ")" stmt

for_stmt
    ::= "for" "(" for_init ";" expr_opt ";" expr_opt ")" stmt

for_init
    ::= /* empty */
     | expr
     | local_var_decl

return_stmt
    ::= "return" expr_opt ";"

expr
    ::= assignment_expr

assignment_expr
    ::= logical_or_expr
     | unary_lvalue "=" assignment_expr

logical_or_expr
    ::= logical_and_expr ("||" logical_and_expr)*

logical_and_expr
    ::= equality_expr ("&&" equality_expr)*

equality_expr
    ::= relational_expr (("==" | "!=") relational_expr)*

relational_expr
    ::= additive_expr (("<" | "<=" | ">" | ">=") additive_expr)*

additive_expr
    ::= multiplicative_expr (("+" | "-") multiplicative_expr)*

multiplicative_expr
    ::= unary_expr (("*" | "/" | "%") unary_expr)*

unary_expr
    ::= primary_expr
     | "+" unary_expr
     | "-" unary_expr
     | "!" unary_expr

primary_expr
    ::= identifier
     | int_literal
     | "true"
     | "false"
     | "(" expr ")"
     | call_expr
     | array_subscript

call_expr
    ::= identifier "(" argument_list_opt ")"

argument_list_opt
    ::= /* empty */
     | argument_list

argument_list
    ::= expr ("," expr)*

array_subscript
    ::= identifier "[" expr "]"

unary_lvalue
    ::= identifier
     | array_subscript

7.3 Grammar policy notes

  • No expression may start with a type name; this eliminates cast ambiguity in version 1.
  • No declaration/expression ambiguity beyond for init should remain.
  • Multiple declarators are excluded to simplify grammar and semantics.

8. Preprocessing Policy

This section is crucial because “actual C++ source text” intersects with the preprocessor.

8.1 Version 1 preprocessing stance

Recommended baseline:

  • NotPlusPlus does not implement the C preprocessor
  • Source files containing preprocessing directives are rejected, except optionally a tiny whitelist for built-in headers that are semantically ignored

This is the cleanest design.

8.2 Why this is acceptable

The goal is to interpret actual C++ syntax and semantics for a subset. The preprocessor is not part of the core expression/statement/declaration grammar and introduces a separate textual transformation language. Supporting C++ source text does not require full preprocessor support in v1.

8.3 Optional compatibility mode

If desired, support exactly:

  • #include <npp> or #include "npp.hpp"

Semantics:

  • no real file loading
  • interpreter injects declarations for built-ins such as print

But this should only be added if there is a strong UX reason. Otherwise, it is simpler to treat built-ins as always available.

8.4 Rejected directives

  • #define
  • #if, #ifdef, etc.
  • #include of arbitrary headers
  • #pragma
  • #line

Diagnostic category:

  • unsupported_preprocessor_directive

9. Architecture

9.1 Pipeline overview

NotPlusPlus shall be implemented as a staged front-end plus interpreter:

  1. Source Manager
  2. Lexer
  3. Parser
  4. AST Builder
  5. Semantic Analyzer
  6. Lowered Semantic IR or Direct Annotated AST
  7. Interpreter Runtime
  8. Diagnostic Engine

9.2 Architectural choice: AST interpreter vs lowered IR

Two viable approaches:

Option A: Direct AST interpreter

Interpret directly over the AST with semantic annotations.

Pros:

  • simpler initial implementation
  • fewer intermediate representations
  • easier source-location propagation

Cons:

  • semantic analysis and runtime concerns may get mixed
  • harder to optimize later

Option B: Lowered semantic IR

Parse to AST, analyze semantically, then lower to a small control-flow/statement IR for interpretation.

Pros:

  • cleaner separation
  • easier execution engine
  • easier constant folding, debugging, tracing
  • better long-term maintainability

Cons:

  • more engineering upfront

Recommended architecture: hybrid.

  • Parse into a high-level AST
  • Perform semantic analysis on AST and produce a resolved, typed semantic model
  • Lower expressions/statements/functions into a typed executable IR for interpretation after semantic analysis succeeds

This keeps the parser close to source while giving the runtime a cleaner structure.

9.3 Subsystems

9.3.1 Source Manager

Responsibilities:

  • own file contents
  • map byte offsets to line/column
  • produce source spans
  • provide excerpt rendering for diagnostics

9.3.2 Lexer

Responsibilities:

  • tokenize source
  • skip comments and whitespace
  • recognize keywords and operators
  • report invalid tokens
  • attach source spans to tokens

9.3.3 Parser

Responsibilities:

  • consume token stream
  • build AST
  • distinguish declaration forms from expression forms within subset grammar
  • recover from syntax errors where practical

9.3.4 AST

Responsibilities:

  • preserve source structure and spans
  • represent declarations, statements, expressions, and types
  • remain syntax-level, not runtime-level

9.3.5 Semantic Analyzer

Responsibilities:

  • symbol table construction
  • declaration validation
  • name resolution
  • type checking
  • lvalue/rvalue classification
  • function signature registration
  • built-in injection
  • subset rule enforcement
  • unsupported construct detection

9.3.6 Executable IR Lowering

Responsibilities:

  • transform semantically valid AST into execution-friendly nodes
  • eliminate parse-only artifacts
  • make control flow explicit
  • store resolved declaration IDs and type IDs

9.3.7 Runtime / Interpreter

Responsibilities:

  • manage call stack
  • manage variable storage
  • evaluate expressions
  • execute statements
  • invoke built-ins
  • detect runtime errors

9.3.8 Diagnostic Engine

Responsibilities:

  • collect and render compile-time diagnostics
  • report runtime errors with stack trace and source spans
  • provide stable error codes

10. Internal Representations

10.1 Source spans

Every token and AST node shall carry a source span:

  • file id
  • start offset
  • end offset

Derived on demand:

  • line
  • column

10.2 Tokens

Each token:

  • kind
  • lexeme slice or interned content
  • source span

Token kinds include:

  • identifiers
  • literals
  • keywords
  • punctuators
  • eof

10.3 AST node model

A representative AST model:

program

  • list of top-level declarations

declarations

  • FunctionDecl
  • FunctionDef
  • ParamDecl
  • VarDecl

statements

  • CompoundStmt
  • DeclStmt
  • ExprStmt
  • IfStmt
  • WhileStmt
  • ForStmt
  • ReturnStmt

expressions

  • IntLiteralExpr
  • BoolLiteralExpr
  • NameExpr
  • UnaryExpr
  • BinaryExpr
  • AssignExpr
  • CallExpr
  • SubscriptExpr
  • ParenExpr

types

  • BuiltinType(Int | Bool | Void)
  • ArrayType(element_type, size)

Every expression node shall later carry:

  • resolved type
  • value category: lvalue or rvalue
  • maybe constant-value metadata if constant folding is added

10.4 Symbol model

function symbols

Fields:

  • name
  • return type
  • parameter types
  • declaration span
  • definition pointer if defined
  • builtin flag
  • builtin handler id if builtin

variable symbols

Fields:

  • name
  • type
  • scope id
  • declaration span
  • storage class category: local / parameter
  • runtime slot index

10.5 Type model

Represent types structurally:

  • Int
  • Bool
  • Void
  • Array(TypeId element, uint32 size)

Intern types in a central table for canonical equality.

10.6 Executable IR

Recommended IR granularity:

executable functions

  • name
  • return type
  • parameter slots
  • body block

executable statements

  • block
  • local declaration
  • store
  • if
  • while
  • for or lowered-for
  • return
  • expr statement

executable expressions

  • literal
  • load local
  • unary op
  • binary op
  • short-circuit logical
  • call resolved function id
  • subscript load/store address form if arrays enabled

Important: use separate lvalue-capable nodes or addressable references for assignable expressions.


11. Parsing Strategy

11.1 Parser type

Use recursive descent.

This is the correct choice for the subset because:

  • grammar is controlled
  • precedence handling is straightforward
  • diagnostics are readable
  • implementation is easy to maintain

11.2 Declaration parsing

Top-level parse logic:

  • parse type
  • parse identifier
  • if next token is (, parse function declaration/definition
  • otherwise reject, because top-level non-function declarations are unsupported

Local scope parse logic:

  • if token begins a supported type specifier, parse declaration statement
  • else parse expression statement

Because casts, user-defined types, and elaborate declarators are excluded, this remains unambiguous.

11.3 Expression parsing

Use precedence climbing or hand-written precedence functions. Recommended:

  • dedicated functions per precedence level

This makes associativity clear:

  • assignment right-associative
  • others left-associative

11.4 Error recovery

Parser should recover at:

  • ;
  • }
  • top-level declaration boundaries

Recovery is important for multi-error reporting in source files.


12. Semantic Analysis

12.1 Analysis phases

Semantic analysis should be split into at least three passes.

Pass 1: declaration collection

  • collect all top-level function declarations and definitions
  • register built-ins
  • detect duplicate function names/signatures

Pass 2: function body analysis

For each function:

  • establish parameter scope
  • analyze statements and expressions
  • resolve identifiers
  • check types
  • assign local storage slots

Pass 3: whole-program validation

  • verify main exists with valid signature
  • verify every non-builtin called function exists
  • verify definitions for declared-but-called functions
  • verify no unsupported unresolved forms remain

12.2 Symbol tables

Use nested scope tables:

  • each scope has parent
  • variables inserted locally
  • functions stored globally

Implementation detail:

  • do not store functions in the same namespace structure as variables unless later needed for shadowing/lookup fidelity
  • since local functions are unsupported, a separate global function table is simpler

12.3 Name resolution

When encountering an identifier expression:

  1. search local scope chain for variable
  2. if expression form is a call, resolve as function in global function table
  3. otherwise error if no variable found

A bare function name as value is unsupported because function pointers are unsupported.

12.4 Type checking rules

Representative rules:

unary + / -

Operand must be int, result int

unary !

Operand must be contextually convertible to bool, result bool

arithmetic binary

Both operands int, result int

relational

Both operands int, result bool

equality

Both operands same supported scalar type, result bool

logical && / ||

Operands contextually convertible to bool, result bool

assignment

LHS must be assignable lvalue RHS must be same type or explicitly allowed conversion Result type is LHS type

call

Function must exist Arity must match Each argument type must match parameter type

subscript

Base must be array lvalue Index must be int Result is lvalue of element type

12.5 Lvalue model

Need explicit value category classification.

Lvalues:

  • variable references
  • array element expressions

Rvalues:

  • literals
  • arithmetic expressions
  • comparison expressions
  • function calls returning non-array scalar values
  • parenthesized lvalues may preserve lvalue if desired, but for v1 this can be simplified only if parser/analysis tracks it properly

Recommended:

  • preserve lvalue-ness through parentheses

12.6 Definite return analysis

Full control-flow analysis is not required for v1. Provide:

  • simple structural check where obvious
  • runtime guard on falling off end of non-void function

Example runtime guard:

  • if function body completes without Return, emit runtime error “control reached end of non-void function”

12.7 Unsupported construct detection

The parser and semantic analyzer must produce specific diagnostics when unsupported but recognizable constructs are used.

Examples:

  • const int x = 1; → unsupported type qualifier
  • int* p; → unsupported pointer declarator
  • namespace std {} → unsupported namespace declaration
  • x++; → unsupported operator

This is better than generic parse failure when the construct is lexically recognizable.


13. Runtime Design

13.1 Execution model

NotPlusPlus interprets one executable function at a time using a call stack.

Execution starts at main.

13.2 Runtime value representation

Recommended runtime value enum:

  • Int(i32)
  • Bool(bool)
  • Array(ArrayObjectId) or inline array storage reference

Avoid boxing every scalar if performance matters, but correctness is primary.

13.3 Variable storage model

Each function activation record has local storage slots. Each local variable symbol is assigned a slot index during semantic analysis or IR lowering.

A frame contains:

  • function id
  • slots vector
  • maybe scope metadata if block-lifetime destruction ever matters

Each slot contains:

  • type id
  • initialized flag
  • value or array object reference

13.4 Block scope handling

Because variable lifetime is lexical and there are no destructors in v1, there are two implementation strategies:

Strategy A: dynamic scope stack

Push/pop runtime maps for each block.

Strategy B: fixed slot frame with lexical slot allocation

Assign each declaration a unique frame slot, valid for the lifetime of the frame; use scope metadata only to block illegal access at compile time.

Recommended:

  • fixed slot frame

Rationale:

  • simpler runtime
  • faster access
  • no need to allocate/deallocate per block
  • lexical rules already enforced statically

Arrays live in their variable slots.

13.5 Function calls

Call procedure:

  1. evaluate arguments left-to-right
  2. create new frame
  3. initialize parameter slots with argument values
  4. mark non-parameter locals uninitialized
  5. execute body
  6. on return, validate return type and yield value
  7. pop frame

13.6 Return propagation

Use an explicit control-flow result type:

ExecOutcome =
  Normal
  Break
  Continue
  Return(Value)

13.7 Runtime arrays

If arrays are supported:

representation

Each array variable slot contains:

  • element type
  • fixed size
  • element storage array
  • initialized bitset per element

semantics

  • indexing performs bounds check
  • out-of-range access is runtime error
  • array expression does not decay to pointer
  • array value passing is unsupported unless array parameters are explicitly modeled

13.8 Built-in execution

Built-ins are dispatched by function symbol or handler id.

Example:

  • print(int) writes decimal integer to stdout or interpreter output sink
  • print(bool) writes true or false

The runtime must abstract output through an interface for testability:

  • real stdout sink
  • capture sink for unit tests

14. Diagnostics

14.1 Diagnostic classes

lexical errors

  • invalid character
  • malformed token
  • unterminated block comment

syntax errors

  • unexpected token
  • expected token
  • malformed declaration
  • malformed expression

semantic errors

  • unknown identifier
  • redeclaration
  • type mismatch
  • invalid assignment target
  • wrong argument count
  • wrong argument type
  • missing main
  • invalid main signature
  • unsupported construct

runtime errors

  • division by zero
  • modulo by zero
  • integer overflow if trapped
  • uninitialized read
  • array bounds violation
  • call depth exceeded
  • missing return at runtime
  • internal interpreter fault

14.2 Diagnostic format

Recommended structure:

  • severity
  • error code
  • primary source span
  • human-readable message
  • optional notes
  • optional related spans

Example:

error[NPP2004]: use of undeclared identifier 'x'
  --> sample.cpp:4:12
   |
4  |     y = x + 1;
   |            ^
note: no local variable or parameter named 'x' is visible in this scope

14.3 Stable error code ranges

Recommended:

  • NPP1xxx lexical
  • NPP2xxx syntax
  • NPP3xxx semantic
  • NPP4xxx runtime
  • NPP9xxx internal

14.4 Runtime stack traces

Runtime errors should emit:

  • message
  • source span of failing expression/statement
  • call stack with function names and call sites where available

15. Built-in Surface

15.1 Philosophy

Built-ins must be valid C++ function calls, not pseudo-syntax.

This preserves the design goal of accepting real C++ source syntax.

15.2 Required built-ins

Minimum:

void print(int);
void print(bool);

If overload support for built-ins is undesirable, alternative names:

void print_int(int);
void print_bool(bool);

However, print overloads are a better user experience and still manageable if isolated to built-ins.

15.3 Prelude model

The interpreter internally injects declarations equivalent to:

void print(int);
void print(bool);

These declarations are reserved. User code may not redefine them.

15.4 Output formatting

  • int: decimal
  • bool: true or false
  • no automatic newline unless println built-ins are added

16. Main Function Contract

16.1 Required entry point

Exactly one valid definition of:

int main()

Recommended baseline:

  • this is the only valid entry signature in version 1

16.2 Rejected alternatives

  • void main()
  • parameterized main
  • overloaded main

16.3 Return behavior

  • explicit return int_expr; supported
  • reaching end of main returns 0

17. File and Module Organization

A recommended Rust implementation layout:

notplusplus/
  src/
    main.rs
    source/
      mod.rs          # source_manager + span
    lex/
      mod.rs          # lexer entry point
      token.rs
      lexer.rs
    parse/
      mod.rs          # parser entry point
      ast.rs
      parser.rs
    sema/
      mod.rs
      types.rs
      symbols.rs
      scope.rs
      sema.rs
    ir/
      mod.rs
      ir.rs
      lower.rs
    interp/
      mod.rs
      value.rs
      frame.rs
      runtime.rs
      builtins.rs
    diag/
      mod.rs
      diagnostic.rs
      engine.rs
    support/
      mod.rs
      intern.rs
  tests/
    lexer/
    parser/
    sema/
    runtime/
    integration/
  docs/
    design.md
  Cargo.toml

Each directory is a Rust module rooted at mod.rs. Visibility is controlled via pub and pub(crate) — prefer pub(crate) for cross-module interfaces that are not part of any public API. There is no public library crate surface in v1; the binary is the product.


18. Implementation Language

18.1 Chosen implementation language: Rust

Rust is the primary implementation language for NotPlusPlus.

Reasoning:

  • Rust's enum-based ADTs and exhaustive pattern matching map directly and naturally onto the AST, IR, and value representation. Every node kind, every value variant, and every diagnostic category becomes a type-checked variant. Adding or removing a variant produces compile errors at every unhandled match site, which enforces consistency across the pipeline automatically.
  • The ownership model eliminates a class of bugs common in hand-written interpreters: use-after-free in value frames, dangling references into scope stacks, and double-free in runtime environments. These are precisely the failure modes that matter in an interpreter managing its own call stack and variable storage.
  • Rust has no garbage collector. The interpreter controls its own memory layout for call frames and runtime values, which is preferable for a system that tracks initialization state per variable slot and enforces configurable call-depth limits.
  • The Result and Option types enforce explicit error handling throughout the pipeline. Diagnostic emission cannot be accidentally silenced; every fallible operation must be handled at the call site.
  • The Rust ecosystem provides mature support for the diagnostic infrastructure this project requires. Crates such as miette and codespan-reporting provide span-aware, terminal-formatted error output without bespoke implementation effort.
  • Rust's test infrastructure — #[test], #[cfg(test)], and the integration test convention under tests/ — maps directly onto the layered test strategy defined in §22 without any additional tooling.

18.2 Implications for the codebase

The pipeline stages defined in §3.7 translate to Rust modules as follows. The lexer produces a flat Vec<Token> with span metadata. The parser consumes tokens and produces an owned AST using Box<Expr> and Vec<Stmt> for recursive structure. The semantic analyzer walks the AST and produces a resolved, typed semantic model with symbol tables, scopes, and expression annotations. IR lowering then translates that validated semantic model into executable IR. The interpreter walks the IR using a call stack of Frame values, each holding a slot array for local variables. All inter-stage errors are returned as structured Diagnostic values accumulated in a shared engine rather than panicked or printed inline.

panic! is reserved for genuinely impossible internal states — conditions that represent interpreter bugs, not user program errors. All user-facing failures travel through the diagnostic engine.

18.3 Rejected alternatives

C++ was considered for its symbolic symmetry with the project's subject matter. It is rejected because building a correct, safe interpreter runtime in C++ requires disciplined manual memory management that adds implementation risk without design benefit. The project's value is in its semantic correctness, not its implementation language irony.

Python is suitable for early prototyping but is not appropriate as the final implementation language. The absence of static types across the pipeline makes it harder to enforce the invariants that the design depends on — particularly around type-checking, IR lowering, and frame management.


19. Detailed Execution Semantics

19.1 Evaluation order

Version 1 shall define deterministic evaluation order even where older/full C++ rules are historically subtle.

Recommended:

  • binary operator operands evaluated left-to-right except short-circuit forms, which obey short-circuit
  • function call arguments evaluated left-to-right
  • assignment evaluates RHS after LHS addressability check but before store
  • subscript evaluates base then index

This is a conscious simplification. It must be documented as a subset semantic choice.

19.2 Undefined behavior policy

NotPlusPlus is not required to emulate all undefined behavior of ISO C++. For supported constructs:

  • some UB-like conditions are rejected statically where possible
  • some are trapped dynamically with deterministic runtime errors

Examples:

  • uninitialized read → runtime error
  • signed overflow → runtime error if checked arithmetic chosen
  • division by zero → runtime error

This is acceptable because the subset contract is explicit and the interpreter semantics are deterministic.

19.3 Statement execution details

declaration statement

  • allocate or locate variable slot
  • if initializer present: evaluate, type-check, store, mark initialized
  • else mark uninitialized

expression statement

  • evaluate for side effects
  • discard result

if statement

  • evaluate condition
  • contextually convert to bool
  • execute one branch

while statement

  • reevaluate condition before every iteration
  • short-circuit semantics inside condition preserved

for statement

Logical execution model:

  1. execute init if present
  2. if cond present, test it; else treat as true
  3. execute body
  4. execute step if present
  5. repeat

If init is a declaration, its scope includes cond, step, and body, and ends after the loop.


20. Arrays Design

This section is normative if arrays are in scope for v1; otherwise it is phase-2 design.

20.1 Supported array forms

  • local fixed-size arrays only
  • element types: int, bool
  • one-dimensional only

20.2 Syntax

int a[5];
bool seen[10];
a[0] = 42;
print(a[0]);

20.3 Semantics

  • storage duration: function activation/frame lifetime
  • indexing requires integer index
  • bounds checked
  • no array-to-pointer decay
  • arrays are not first-class assignable values

20.4 Parameter passing

Strong recommendation for version 1:

  • do not support array parameters

Rationale:

  • real C++ adjusts array parameters to pointers
  • pointers are out of scope
  • modeling this faithfully without pointers is awkward

So:

  • array types allowed only for local variables

21. Unsupported Features and Rejection Policy

This section must remain explicit for product integrity.

21.1 Unsupported syntax categories

  • preprocessing directives
  • namespace qualifications like std::x
  • member access a.b
  • stream insertion <<
  • type qualifiers and storage specifiers
  • advanced declarators
  • pointers/references
  • classes/structs
  • initializer lists
  • string and char literals
  • templates
  • exceptions
  • overloading for user-defined functions
  • implicit declarations
  • aggregate initialization

21.2 Rejection requirements

The system shall reject unsupported constructs with targeted diagnostics wherever practical.

Example:

std::cout << x;

Preferred diagnostic:

error[NPP3018]: stream insertion expressions are unsupported

not merely:

error: expected ';'

22. Testing Strategy

22.1 Test layers

lexer tests

  • tokenization correctness
  • comment handling
  • integer literal scanning
  • operator scanning
  • source span correctness

parser tests

  • function declarations and definitions
  • expression precedence
  • statement forms
  • syntax error recovery

semantic tests

  • scope resolution
  • shadowing
  • redeclaration errors
  • type mismatch errors
  • return validity
  • call resolution
  • unsupported construct rejection

runtime tests

  • arithmetic
  • conditions
  • loops
  • recursion
  • builtin output
  • runtime errors

integration tests

  • end-to-end source file execution
  • output capture comparison
  • diagnostic golden files

22.2 Golden tests

Use golden files for:

  • diagnostics
  • stack traces
  • program output

22.3 Required sample programs

At minimum:

arithmetic

int main() {
    int x = 2 + 3 * 4;
    print(x);
    return 0;
}

block shadowing

int main() {
    int x = 1;
    {
        int x = 2;
        print(x);
    }
    print(x);
    return 0;
}

function call

int add(int a, int b) {
    return a + b;
}
int main() {
    print(add(10, 20));
    return 0;
}

while loop

int main() {
    int i = 0;
    while (i < 3) {
        print(i);
        i = i + 1;
    }
    return 0;
}

for loop

int main() {
    for (int i = 0; i < 3; i = i + 1) {
        print(i);
    }
    return 0;
}

recursion

int fact(int n) {
    if (n == 0) {
        return 1;
    }
    return n * fact(n - 1);
}
int main() {
    print(fact(5));
    return 0;
}

runtime uninitialized read

int main() {
    int x;
    print(x);
    return 0;
}

unsupported pointer

int main() {
    int* p;
    return 0;
}

23. Milestone Plan

23.1 Milestone 0: skeleton

Deliverables:

  • project scaffolding
  • source manager
  • diagnostics base
  • token definitions

Exit criteria:

  • build system works
  • diagnostic rendering works

23.2 Milestone 1: lexer

Deliverables:

  • comments
  • identifiers
  • literals
  • punctuation/operators
  • keyword recognition

Exit criteria:

  • lexer golden tests pass

23.3 Milestone 2: parser core

Deliverables:

  • function parse
  • statement parse
  • expression precedence parse
  • AST generation

Exit criteria:

  • parser accepts basic programs
  • syntax errors reported correctly

23.4 Milestone 3: semantic analysis

Deliverables:

  • function table
  • variable scopes
  • type checking
  • main validation
  • analyzed AST / semantic model for validated programs

Exit criteria:

  • semantic test corpus passes
  • unsupported constructs rejected accurately

23.5 Milestone 4: interpreter core

Deliverables:

  • typed IR lowering from the analyzed AST / semantic model
  • scalar runtime values
  • statements and expressions
  • function call stack
  • returns
  • built-ins

Exit criteria:

  • arithmetic, control flow, functions work end-to-end

23.6 Milestone 5: loops and recursion hardening

Deliverables:

  • for
  • recursion
  • stack traces
  • runtime error reporting

Exit criteria:

  • integration tests stable

23.7 Milestone 6: arrays

Deliverables:

  • array declaration
  • indexing
  • bounds checks
  • initialization tracking

Exit criteria:

  • array tests stable

23.8 Milestone 7: polish

Deliverables:

  • improved diagnostics
  • CLI options
  • trace mode or debug dump mode
  • documentation synchronization

Exit criteria:

  • v1 release candidate

24. CLI Design

24.1 Basic invocation

npp program.cpp

24.2 Suggested options

  • --dump-tokens
  • --dump-ast
  • --dump-sema
  • --dump-ir
  • --trace-exec
  • --no-color
  • --max-call-depth=N

24.3 Exit codes

  • 0: program ran successfully and returned 0
  • non-zero program return code may map to process exit code if desired
  • dedicated interpreter failure codes for diagnostics/runtime failures

Recommended:

  • compilation/semantic failure → exit 2
  • runtime failure → exit 3
  • internal failure → exit 4
  • successful program execution → program return code modulo process constraints

25. Determinism and Reproducibility

NotPlusPlus shall avoid behavior depending on:

  • unordered container iteration
  • host integer overflow semantics
  • locale-dependent formatting
  • platform-specific newline handling beyond normalized I/O

All runtime-visible semantics must be deterministic.


26. Performance Expectations

Performance is secondary to correctness for v1.

Expected scale:

  • single-file programs
  • tens to low hundreds of functions
  • small recursion depths
  • small arrays
  • low-latency interpretation for educational/demo/scripting workloads

No optimization pipeline is required.


27. Security and Robustness

27.1 Untrusted input

Source code is untrusted input. The interpreter must guard against:

  • infinite recursion causing host stack overflow
  • pathological parse recursion where practical
  • excessive memory allocation from huge array sizes
  • integer overflow in internal indexing

27.2 Runtime limits

Configurable limits:

  • maximum source size
  • maximum call depth
  • maximum array size
  • maximum total allocated runtime storage

27.3 Internal assertions

Use assertions for impossible states, but surface recoverable user-facing failures as diagnostics or runtime errors.


28. Risks and Mitigations

28.1 Scope creep into real C++ front-end complexity

Risk:

  • adding just one more feature like pointers or references causes cascading design complexity

Mitigation:

  • freeze v1 subset
  • require explicit design amendment for each feature addition

28.2 Grammar drift into non-C++ behavior

Risk:

  • parser convenience may accidentally accept syntax that is not C++

Mitigation:

  • every grammar addition must map to real C++ syntax
  • no custom statements or operators

28.3 Built-ins becoming a fake language surface

Risk:

  • too many magic functions create non-C++ semantics

Mitigation:

  • keep built-ins minimal
  • model them as ordinary global functions

28.4 Semantic ambiguity around conversions

Risk:

  • partial C++ conversion rules become inconsistent

Mitigation:

  • keep conversion lattice deliberately tiny and documented
  • prefer strict exact-match rules except contextual bool conversion

29. Version 1 Product Contract

A source program is accepted by NotPlusPlus v1 if and only if:

  1. it consists of supported top-level function declarations/definitions
  2. every declaration, statement, and expression belongs to the supported subset
  3. type checking succeeds under the subset rules
  4. exactly one valid entry point int main() exists
  5. all runtime operations stay within defined execution constraints

A program outside that contract is rejected.


30. Recommended Final v1 Feature Set

This is the strongest recommended baseline for a coherent first release.

30.1 Must-have

  • single translation unit
  • comments
  • int, bool, void
  • function declarations and definitions
  • local variables
  • block scope
  • integer/boolean literals
  • arithmetic/comparison/logical expressions
  • assignment
  • if, while, for
  • return
  • int main()
  • built-in print(int) and print(bool)
  • recursion
  • semantic diagnostics
  • runtime error handling
  • deterministic evaluation order

30.2 Should-have

  • array local variables with indexing and bounds checks
  • AST and IR dump modes
  • stack traces for runtime errors
  • fixed slot allocation per frame

30.3 Should-not-have in v1

  • macros
  • headers
  • namespaces
  • pointers/references
  • user overloads
  • strings
  • classes
  • templates
  • exceptions

31. Example Accepted Programs

31.1 Basic arithmetic

int main() {
    int x = 10;
    int y = 20;
    print(x + y);
    return 0;
}

31.2 Boolean control flow

bool gt(int a, int b) {
    return a > b;
}

int main() {
    if (gt(5, 3)) {
        print(true);
    } else {
        print(false);
    }
    return 0;
}

31.3 Loop and scope

int main() {
    int x = 0;
    for (int i = 0; i < 3; i = i + 1) {
        int x = i;
        print(x);
    }
    print(x);
    return 0;
}

31.4 Arrays if enabled

int main() {
    int a[3];
    a[0] = 4;
    a[1] = 5;
    a[2] = a[0] + a[1];
    print(a[2]);
    return 0;
}

32. Example Rejected Programs

32.1 Pointer usage

int main() {
    int* p;
    return 0;
}

Reason: pointers unsupported.

32.2 String literal

int main() {
    print("hello");
    return 0;
}

Reason: string literals unsupported.

32.3 Namespace qualification

int main() {
    std::cout << 1;
    return 0;
}

Reason: namespaces and stream insertion unsupported.

32.4 Multiple declarators

int main() {
    int a = 1, b = 2;
    return 0;
}

Reason: multi-declarator declarations unsupported in v1.


33. Amendments Policy

This document is the design contract for v1. Any feature addition must be recorded as an amendment specifying:

  • syntax accepted
  • semantics
  • diagnostics
  • runtime representation impact
  • interaction with existing features
  • migration impact on tests and docs

No feature should be added informally.

33.1 Amendment: Milestone 3 boundary

Milestone 3 ends after semantic analysis produces a resolved, typed semantic model over the AST. This model includes function symbols, variable scopes, name-resolution results, and expression type/lvalue annotations.

Typed executable IR lowering is deferred to milestone 4. Milestone 3 therefore validates programs semantically but does not yet require executable IR construction.

33.2 Amendment: Scope binding storage model

The semantic analyzer's variable binding table shall be keyed by ScopeId, not by positional index into a parallel vector. The binding structure shall be a map from ScopeId to a map from name to VarId:

bindings: HashMap<ScopeId, HashMap<String, VarId>>

Rationale

Using a positional Vec indexed by ScopeId ordinal couples two independent allocation sequences: scope creation in the ScopeTree and entry creation in the bindings table. Any code path that creates a scope without a corresponding bindings push — or vice versa — produces silent index misalignment or a panic. A HashMap<ScopeId, ...> structure makes the association explicit and eliminates this coupling.

Interaction with existing features

Variable lookup (§12.2) and shadowing (§6.1.3) behavior are unchanged. The scope tree's parent chain remains the authority for lexical lookup order. Only the internal storage representation changes.

Migration impact

All semantic analysis code that indexes into self.bindings[scope.0] must be replaced with keyed access. No test semantics change; only the internal data structure changes.


33.3 Amendment: Preprocessor directive diagnostics

Source files containing preprocessing directives shall be rejected with a targeted diagnostic of category unsupported_preprocessor_directive, not with a generic invalid-character error for #.

Syntax recognized

A line whose first non-whitespace character is # shall be recognized by the lexer as a preprocessing directive line.

Diagnostics

The lexer shall emit a diagnostic with a dedicated code in the NPP1xxx range when encountering a #-prefixed directive. The diagnostic message shall identify the specific directive where recognizable (e.g., #include, #define, #ifdef, #pragma) and fall back to a generic "preprocessing directives are unsupported" message otherwise.

Example:

error[NPP1004]: preprocessing directive '#include' is unsupported
  --> sample.cpp:1:1
   |
1  | #include <iostream>
   | ^^^^^^^^^

Semantics

No preprocessing is performed. The directive line is consumed and skipped after diagnostic emission to allow continued lexing of subsequent source.

Interaction with existing features

This amends §8 (Preprocessing Policy) by specifying the diagnostic mechanism. The lexer's existing invalid-character path for # is superseded by this targeted recognition.


33.4 Amendment: Parenthesized expression representation in semantic model

The AnalyzedExprKind::Paren variant shall hold an owned inner expression, not a clone of a separately stored expression. The semantic model shall avoid cloning expression trees for parenthesized expressions.

Representation

Paren(Box<AnalyzedExpr>)

The inner expression is moved into the Paren wrapper. No separate copy exists.

Semantics

Parenthesized expressions preserve the type and lvalue status of the inner expression exactly. This is unchanged from the existing rule (§12.5).

Interaction with existing features

IR lowering already erases parentheses by recursing through Paren nodes. The semantic model representation change does not affect lowered IR or runtime behavior.


33.5 Amendment: Scope lifetime and cleanup in semantic analysis

Scope binding entries created during semantic analysis persist for the duration of analysis. There is no requirement to deallocate or "pop" binding entries when leaving a scope.

Rationale

Because ScopeId values are unique and monotonically assigned, stale binding entries from exited scopes are unreachable through the lookup chain (which walks parent links from the current scope). Retaining them is harmless and simplifies the analyzer.

This amendment makes the existing behavior an explicit design choice rather than an accidental omission.

Constraint

The lookup procedure (§6.1.2, §12.3) shall never consult a scope that is not an ancestor of the current scope. This invariant is enforced by walking the ScopeTree parent chain and is independent of whether binding entries for unrelated scopes exist.


33.6 Amendment: break and continue statement support

break and continue are supported statements in version 1.

Syntax

break_stmt ::= "break" ";"
continue_stmt ::= "continue" ";"

Keywords

break and continue are added to the supported keyword set (§5.3).

Semantics

break immediately exits the innermost enclosing while or for loop. continue skips the remainder of the current iteration and proceeds to the loop's condition re-evaluation (for while) or step expression followed by condition re-evaluation (for for).

Both are semantic errors if used outside a loop body.

Diagnostics

  • NPP3010: break or continue used outside of a loop body.

Runtime representation

The executable IR includes Break(Span) and Continue(Span) statement variants. The interpreter's execution flow enum includes Break and Continue variants alongside Normal and Return.

Interaction with existing features

break and continue interact with for and while loops. They do not interact with if or compound statements beyond propagating through them. A break or continue that escapes a function body is an internal error.


33.7 Amendment: Declared-but-unused function declarations

A function that is declared but never defined and never called is not an error. The semantic analyzer shall only emit an error for a declared-but-undefined function when it is referenced in a call expression.

Rationale

This matches C++ behavior where forward declarations without definitions are permitted as long as no definition is required by the linker. Since NotPlusPlus has no separate compilation, the analogue is call-site usage.

Diagnostics

No diagnostic is emitted for unused forward declarations. The existing NPP3012 diagnostic ("function '...' is declared but never defined") is emitted only when a call to such a function is encountered.