Skip to content

test: add lexer unit tests for parser/lexer.cpp#31

Merged
poyrazK merged 3 commits intomainfrom
test/raft-group-coverage
Apr 14, 2026
Merged

test: add lexer unit tests for parser/lexer.cpp#31
poyrazK merged 3 commits intomainfrom
test/raft-group-coverage

Conversation

@poyrazK
Copy link
Copy Markdown
Owner

@poyrazK poyrazK commented Apr 14, 2026

Summary

Add 54 unit tests covering Lexer class in parser/lexer.cpp:

  • Basic tokenization (SELECT, identifiers, numbers, strings)
  • Keywords case insensitivity
  • Operators (=, <>, <, <=, >, >=, +, -, *, /)
  • Delimiters (parentheses, comma, semicolon, dot)
  • Whitespace and newline handling
  • SQL comments (-- style)
  • Position tracking (line/column)
  • peek_token() behavior
  • Error handling (unterminated strings, invalid chars)
  • Complex SQL statements (CREATE TABLE, INSERT, UPDATE, DELETE, JOIN)

Test plan

  • All 28 tests pass locally
  • 54 lexer-specific tests added

Summary by CodeRabbit

  • Tests
    • Added comprehensive unit test coverage for the SQL lexer, validating keyword recognition, operator handling, literal parsing, whitespace processing, error detection, and position tracking.

- Add 54 unit tests covering Lexer class:
  - Basic tokenization (SELECT, identifiers, numbers, strings)
  - Keywords case insensitivity
  - Operators (=, <>, <, <=, >, >=, +, -, *, /)
  - Delimiters (parentheses, comma, semicolon, dot)
  - Whitespace and newline handling
  - SQL comments (-- style)
  - Position tracking (line/column)
  - peek_token() behavior
  - Error handling (unterminated strings, invalid chars)
  - Complex SQL statements (CREATE TABLE, INSERT, UPDATE, DELETE, JOIN)
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

Warning

Rate limit exceeded

@poyrazK has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 49 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 19 minutes and 49 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 66311e17-ff8f-4526-8e19-94f7970bf674

📥 Commits

Reviewing files that changed from the base of the PR and between 4093341 and 36b2833.

📒 Files selected for processing (1)
  • tests/lexer_tests.cpp
📝 Walkthrough

Walkthrough

This PR adds comprehensive unit test coverage for the SQL lexer component. A new test executable lexer_tests is registered in the CMake build configuration, with 538 lines of GoogleTest-based test implementations validating lexer tokenization across keywords, operators, literals, delimiters, and error conditions.

Changes

Cohort / File(s) Summary
Build Configuration
CMakeLists.txt
Registered new test target lexer_tests from tests/lexer_tests.cpp using existing add_cloudsql_test macro within the BUILD_TESTS conditional.
Lexer Unit Tests
tests/lexer_tests.cpp
New comprehensive test suite exercising SQL lexer functionality: token consumption, end-of-input detection, keyword matching (case-insensitive), operators, literals (integers, floats, strings), delimiters, type keywords, error handling for invalid characters and unterminated strings, and position tracking via line/column methods.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • Stability & Testing Refinement #11: Modifies the same Lexer class with changes to init_keywords and TokenType definitions that these new unit tests directly exercise and validate.

Poem

🐰 Hops of joy for lexer tests so bright,
Tokenizing SQL with all our might,
Keywords parsed from left to right,
No more bugs hiding out of sight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and accurately describes the main change: adding unit tests for the SQL lexer component, which aligns with the substantial test file addition (538 lines) in tests/lexer_tests.cpp.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/raft-group-coverage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/lexer_tests.cpp (1)

176-182: StringWithEscapedQuote currently has no behavioral assertion.

This test is too permissive and can pass even with unintended lexer behavior. Please assert a concrete expected contract (current or target behavior) so it actually protects against regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/lexer_tests.cpp` around lines 176 - 182, The test
StringWithEscapedQuote is too vague—update it to assert a concrete expected
contract using tokenize and the tokens vector: decide whether the lexer should
treat "'it''s cool'" as a single STRING token or as two separate STRING tokens
(current behavior seems to produce two), then replace ASSERT_GE(tokens.size(),
2) with precise checks (e.g., ASSERT_EQ(tokens.size(), N) and assert the token
types/values for tokens[0] and tokens[1] or tokens[0]’s text) using the
project’s Token accessors so the test fails on unintended lexer changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/lexer_tests.cpp`:
- Around line 87-103: The size precondition is too small and can allow
out-of-bounds indexed reads on tokens (the test reads tokens[0]..tokens[14]);
update the assertion in tests/lexer_tests.cpp from ASSERT_GE(tokens.size(), 12)
to assert at least 15 elements (e.g., ASSERT_GE(tokens.size(), 15)) so all
subsequent EXPECT_EQ checks against TokenType (Select, Distinct, Identifier,
..., Asc) are safe.
- Around line 26-33: The tokenize helper can append the End token twice because
it uses while(!lexer.is_at_end()) to push tokens and then calls
lexer.next_token() again; change tokenize so it repeatedly calls
Lexer::next_token() and pushes the returned Token until a Token with
TokenType::End is produced (then stop), or alternatively rely solely on
is_at_end() without the extra lexer.next_token() call—ensure you reference
tokenize, Lexer::next_token, Lexer::is_at_end, and TokenType::End when making
the change.

---

Nitpick comments:
In `@tests/lexer_tests.cpp`:
- Around line 176-182: The test StringWithEscapedQuote is too vague—update it to
assert a concrete expected contract using tokenize and the tokens vector: decide
whether the lexer should treat "'it''s cool'" as a single STRING token or as two
separate STRING tokens (current behavior seems to produce two), then replace
ASSERT_GE(tokens.size(), 2) with precise checks (e.g., ASSERT_EQ(tokens.size(),
N) and assert the token types/values for tokens[0] and tokens[1] or tokens[0]’s
text) using the project’s Token accessors so the test fails on unintended lexer
changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8e37b4dd-450a-4b07-9193-4c6e150071e2

📥 Commits

Reviewing files that changed from the base of the PR and between 078c09b and 4093341.

📒 Files selected for processing (2)
  • CMakeLists.txt
  • tests/lexer_tests.cpp

Comment thread tests/lexer_tests.cpp
Comment thread tests/lexer_tests.cpp Outdated
- Fix tokenize helper to correctly handle End token
- Fix KeywordsVariety ASSERT_GE from 12 to 15 for safe indexing
- Fix StringWithEscapedQuote with precise token assertions
@poyrazK poyrazK merged commit eb8bf5d into main Apr 14, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant