Conversation
* do not pass filenames to cppcheck, just use compile db * add nix run configure just in case
* add token equality operator * add ctor for optional literal * rename tests for parity * test on tokens directly, not string repr
There was a problem hiding this comment.
Pull request overview
This PR introduces the initial lexer/token infrastructure (plus a minimal interpreter) for the invariants language, along with a Nix/prek workflow to build and run tests.
Changes:
- Added
TokenType/Tokenmodel with printing, equality, and literal support. - Implemented a
Lexerthat scans source text into a token stream and added GTest coverage for tokens/lexing. - Added a minimal
Interpreterthat runs the lexer and prints tokens; wired new libs/tests into CMake and addednix run .#test.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| prek.toml | Switches to local prek hooks and adds Nix-based configure/format/tidy/cppcheck/test hooks. |
| flake.nix | Adds dev-test app and exposes nix run .#test. |
| README.md | Documents running the test suite via Nix app. |
| lang/src/lexer/token.hpp | Adds token types, literal variant, and token API. |
| lang/src/lexer/token.cpp | Implements token formatting and comparison. |
| lang/src/lexer/lexer.hpp | Declares lexer and keyword table. |
| lang/src/lexer/lexer.cpp | Implements scanning logic for tokens, literals, comments, whitespace. |
| lang/src/interp/interpreter.hpp | Declares minimal interpreter skeleton. |
| lang/src/interp/interpreter.cpp | Implements run() by lexing and printing tokens. |
| lang/src/**/CMakeLists.txt | Builds lexer/interpreter libs and wires them into the main executable. |
| lang/tests/** | Adds GTest suites for token/lexer and a basic interpreter test; updates test CMake structure. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #include "token.hpp" | ||
|
|
||
| #include <ostream> | ||
| #include <string> |
| #pragma once | ||
|
|
||
| #include <string> | ||
| #include <unordered_map> | ||
| #include <vector> |
| addToken(type, false); | ||
| return; | ||
| } | ||
|
|
||
| addToken(type); |
| #include <cstddef> | ||
| #include <string_view> | ||
|
|
||
| namespace invariants::interpreter { | ||
|
|
||
| class Interpreter { | ||
| private: | ||
| bool hadErr = false; | ||
| void report(std::size_t line, std::string_view where, std::string_view msg); | ||
|
|
| // #include <string_view> | ||
|
|
||
| #include "lexer.hpp" | ||
|
|
||
| namespace invariants::interpreter { | ||
|
|
||
| // void Interpreter::report(std::size_t line, std::string_view where, | ||
| // std::string_view msg) { | ||
| // std::println("[line %d] Error %s : %s", line, where, msg); | ||
| // hadErr = true; | ||
| // } |
| return !(this->type == other.type && this->lexeme == other.lexeme && | ||
| this->literal == other.literal && this->line == other.line); |
| bool operator!=(const Token& other) const; | ||
| std::string toString() const; | ||
|
|
||
| friend std::ostream& operator<<(std::ostream& os, const Token& token); | ||
| }; |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces the initial C++ implementation of the invariants language lexer/token model (plus a minimal interpreter stub) and wires them into the build/test + developer tooling.
Changes:
- Add
TokenType/Token(with literals, formatting, and equality) and aLexerthat scans source text into tokens. - Add initial
Interpreter::run()that lexes input and prints the token stream. - Add GoogleTest coverage for token and lexer behavior; update CMake/Nix/prek tooling to build and run tests.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
prek.toml |
Switch hooks to local/system hooks invoking Nix commands (configure/format/tidy/cppcheck/test). |
lang/src/lexer/token.hpp |
Introduces token types, literal representation, and Token API. |
lang/src/lexer/token.cpp |
Implements token formatting and equality helpers. |
lang/src/lexer/lexer.hpp |
Declares lexer and keyword table. |
lang/src/lexer/lexer.cpp |
Implements scanning logic for punctuation/operators/literals/keywords/comments. |
lang/src/lexer/CMakeLists.txt |
Builds invariants_lexer library. |
lang/src/interp/interpreter.hpp / interpreter.cpp |
Adds minimal interpreter that runs the lexer and prints tokens. |
lang/src/interp/CMakeLists.txt |
Builds invariants_interp library. |
lang/src/CMakeLists.txt / lang/CMakeLists.txt |
Adds subdirectories and links hello_world to lexer. |
lang/tests/lexer/* + lang/tests/interp/* + lang/tests/CMakeLists.txt |
Adds/organizes lexer/token/interpreter tests and discovery. |
flake.nix |
Adds nix run .#test app and wires it into flake outputs. |
README.md |
Documents nix run .#test and notes about impure scripts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| std::string literalToString(const Literal& lit) { | ||
| return std::visit( | ||
| [](const auto& value) -> std::string { | ||
| using T = std::decay_t<decltype(value)>; | ||
|
|
||
| if constexpr (std::is_same_v<T, std::monostate>) { | ||
| return "null"; | ||
| } else if constexpr (std::is_same_v<T, std::string>) { | ||
| return value; | ||
| } else if constexpr (std::is_same_v<T, double>) { | ||
| return std::to_string(value); | ||
| } else if constexpr (std::is_same_v<T, int>) { | ||
| return std::to_string(value); | ||
| } else if constexpr (std::is_same_v<T, bool>) { | ||
| return value ? "true" : "false"; | ||
| } |
| const std::string source; | ||
| std::vector<Token> tokens; | ||
| inline static const std::unordered_map<std::string_view, TokenType> keywords{ | ||
| {"spec", TokenType::KW_SPEC}, | ||
| {"field", TokenType::KW_FIELD}, | ||
| {"check", TokenType::KW_CHECK}, | ||
| {"invariant", TokenType::KW_INVARIANT}, | ||
| {"Boolean", TokenType::KW_BOOLEAN}, | ||
| {"true", TokenType::LIT_BOOLEAN_T}, | ||
| {"false", TokenType::LIT_BOOLEAN_F}, | ||
| {"Array", TokenType::KW_ARRAY}, | ||
| {"Null", TokenType::KW_NULL}, | ||
| {"null", TokenType::LIT_NULL}, | ||
| {"String", TokenType::KW_STRING}, | ||
| {"Number", TokenType::KW_NUMBER}, | ||
| {"Integer", TokenType::KW_INTEGER}, | ||
| {"IN", TokenType::KW_IN}, | ||
| {"NIN", TokenType::KW_NOT_IN}, | ||
| {"NI", TokenType::KW_CONTAINS}, | ||
| }; | ||
| size_t start = 0; | ||
| size_t curr = 0; | ||
| size_t line = 1; | ||
|
|
||
| void scanToken(); | ||
| char advance(); | ||
| void addToken(TokenType type); | ||
| void addToken(TokenType type, Literal literal); | ||
|
|
||
| public: | ||
| explicit Lexer(std::string_view source); | ||
| std::vector<Token> scanTokens(); |
| std::string text = source.substr(start, curr - start); | ||
|
|
||
| auto it = keywords.find(text); | ||
| TokenType type = | ||
| (it != keywords.end()) ? it->second : TokenType::LIT_IDENTIFIER; | ||
|
|
||
| // Check if boolean and if so, add relevant literals | ||
| if (type == TokenType::LIT_BOOLEAN_T) { | ||
| addToken(type, true); | ||
| return; | ||
| } | ||
|
|
||
| if (type == TokenType::LIT_BOOLEAN_F) { | ||
| addToken(type, false); | ||
| return; | ||
| } | ||
|
|
||
| addToken(type); | ||
| } |
| dev-configure = pkgs.writeShellApplication { | ||
| name = "dev-configure"; | ||
| meta.description = "Configure clangd environment."; | ||
| runtimeInputs = with pkgs; [ | ||
| clang | ||
| cmake | ||
| ninja | ||
| ]; | ||
| text = '' | ||
| set -euo pipefail | ||
| cmake -S lang -B .nix-dev/build | ||
| ''; |
| dev-test = pkgs.writeShellApplication { | ||
| name = "dev-test"; | ||
| meta.description = "Run test suite."; | ||
| runtimeInputs = with pkgs; [ | ||
| cmake | ||
| ninja | ||
| ]; | ||
| text = '' | ||
| set -euo pipefail | ||
| cmake -S lang -B .nix-dev/build | ||
| cmake --build .nix-dev/build | ||
| ctest --test-dir .nix-dev/build --output-on-failure | ||
| ''; |
|
|
||
| friend std::ostream& operator<<(std::ostream& os, const Token& token); | ||
| }; |
| using Literal = std::variant<std::monostate, // null | ||
| std::string, // identifiers + strings | ||
| int, // integers | ||
| double, // numbers | ||
| bool // booleans | ||
| >; |
This PR adds the first-steps and initial foundation for the
invariantslang, the lexer! The lexer is responsible for converting rawinvariantscode into a series of tokens.The set of tokens consists of all possible symbols, literals and otherwise keywords of importance, such as
spec,==,+,Boolean, etc... A literal, in this context, is the "literal" value associated with a token, and is either astd::monostate, for tokens with no literal values (eg; operators, keywords, punctuation, nulls), astd::stringfor identifiers and string-type variables,intfor integers,doublefor "numbers" (as per the OpenAPI spec), orboolfor booleans. Some helper operators such asostream<<and equality checks have also been implemented. For the most part, this is just a fairly straightforward representation of data.The lexer class is more interesting, as it parses our text and converts it into a stream of tokens. The implementation thereof is fairly straightforward, as thanks to our token struct inheriting from
uint8, we can use a switch/case table which devolves into a jump-table post-compilation for what I imagine would be a significant performance boost as opposed to using a map to match characters to tokens. We, at most, lookahead 2 characters.A very basic supported (scannable) input might look like this:
Also, a very, very, very basic interpreter has been added. Some more work needs to be done surrounding the actual CLI and "framework" of the language, such as adding a REPL, proper error types, etc... but that is out of scope of this branch and can be tackled much later as those are more niceties than anything.
Also, we make extensive use of GTest and (ideally) have tested most common paths.