test: add lexer unit tests for parser/lexer.cpp by poyrazK · Pull Request #31 · poyrazK/cloudSQL

poyrazK · 2026-04-14T12:23:13Z

Summary

Add 54 unit tests covering Lexer class in parser/lexer.cpp:

Basic tokenization (SELECT, identifiers, numbers, strings)
Keywords case insensitivity
Operators (=, <>, <, <=, >, >=, +, -, *, /)
Delimiters (parentheses, comma, semicolon, dot)
Whitespace and newline handling
SQL comments (-- style)
Position tracking (line/column)
peek_token() behavior
Error handling (unterminated strings, invalid chars)
Complex SQL statements (CREATE TABLE, INSERT, UPDATE, DELETE, JOIN)

Test plan

All 28 tests pass locally
54 lexer-specific tests added

Summary by CodeRabbit

Tests
- Added comprehensive unit test coverage for the SQL lexer, validating keyword recognition, operator handling, literal parsing, whitespace processing, error detection, and position tracking.

- Add 54 unit tests covering Lexer class: - Basic tokenization (SELECT, identifiers, numbers, strings) - Keywords case insensitivity - Operators (=, <>, <, <=, >, >=, +, -, *, /) - Delimiters (parentheses, comma, semicolon, dot) - Whitespace and newline handling - SQL comments (-- style) - Position tracking (line/column) - peek_token() behavior - Error handling (unterminated strings, invalid chars) - Complex SQL statements (CREATE TABLE, INSERT, UPDATE, DELETE, JOIN)

coderabbitai · 2026-04-14T12:23:30Z

Warning

Rate limit exceeded

@poyrazK has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 49 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 19 minutes and 49 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 66311e17-ff8f-4526-8e19-94f7970bf674

📥 Commits

Reviewing files that changed from the base of the PR and between 4093341 and 36b2833.

📒 Files selected for processing (1)

tests/lexer_tests.cpp

📝 Walkthrough

Walkthrough

This PR adds comprehensive unit test coverage for the SQL lexer component. A new test executable lexer_tests is registered in the CMake build configuration, with 538 lines of GoogleTest-based test implementations validating lexer tokenization across keywords, operators, literals, delimiters, and error conditions.

Changes

Cohort / File(s)	Summary
Build Configuration `CMakeLists.txt`	Registered new test target `lexer_tests` from `tests/lexer_tests.cpp` using existing `add_cloudsql_test` macro within the `BUILD_TESTS` conditional.
Lexer Unit Tests `tests/lexer_tests.cpp`	New comprehensive test suite exercising SQL lexer functionality: token consumption, end-of-input detection, keyword matching (case-insensitive), operators, literals (integers, floats, strings), delimiters, type keywords, error handling for invalid characters and unterminated strings, and position tracking via line/column methods.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Stability & Testing Refinement #11: Modifies the same Lexer class with changes to init_keywords and TokenType definitions that these new unit tests directly exercise and validate.

Poem

🐰 Hops of joy for lexer tests so bright,
Tokenizing SQL with all our might,
Keywords parsed from left to right,
No more bugs hiding out of sight! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title clearly and accurately describes the main change: adding unit tests for the SQL lexer component, which aligns with the substantial test file addition (538 lines) in tests/lexer_tests.cpp.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch test/raft-group-coverage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tests/lexer_tests.cpp (1)
176-182: StringWithEscapedQuote currently has no behavioral assertion.

This test is too permissive and can pass even with unintended lexer behavior. Please assert a concrete expected contract (current or target behavior) so it actually protects against regressions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/lexer_tests.cpp` around lines 176 - 182, The test
StringWithEscapedQuote is too vague—update it to assert a concrete expected
contract using tokenize and the tokens vector: decide whether the lexer should
treat "'it''s cool'" as a single STRING token or as two separate STRING tokens
(current behavior seems to produce two), then replace ASSERT_GE(tokens.size(),
2) with precise checks (e.g., ASSERT_EQ(tokens.size(), N) and assert the token
types/values for tokens[0] and tokens[1] or tokens[0]’s text) using the
project’s Token accessors so the test fails on unintended lexer changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/lexer_tests.cpp`:
- Around line 87-103: The size precondition is too small and can allow
out-of-bounds indexed reads on tokens (the test reads tokens[0]..tokens[14]);
update the assertion in tests/lexer_tests.cpp from ASSERT_GE(tokens.size(), 12)
to assert at least 15 elements (e.g., ASSERT_GE(tokens.size(), 15)) so all
subsequent EXPECT_EQ checks against TokenType (Select, Distinct, Identifier,
..., Asc) are safe.
- Around line 26-33: The tokenize helper can append the End token twice because
it uses while(!lexer.is_at_end()) to push tokens and then calls
lexer.next_token() again; change tokenize so it repeatedly calls
Lexer::next_token() and pushes the returned Token until a Token with
TokenType::End is produced (then stop), or alternatively rely solely on
is_at_end() without the extra lexer.next_token() call—ensure you reference
tokenize, Lexer::next_token, Lexer::is_at_end, and TokenType::End when making
the change.

---

Nitpick comments:
In `@tests/lexer_tests.cpp`:
- Around line 176-182: The test StringWithEscapedQuote is too vague—update it to
assert a concrete expected contract using tokenize and the tokens vector: decide
whether the lexer should treat "'it''s cool'" as a single STRING token or as two
separate STRING tokens (current behavior seems to produce two), then replace
ASSERT_GE(tokens.size(), 2) with precise checks (e.g., ASSERT_EQ(tokens.size(),
N) and assert the token types/values for tokens[0] and tokens[1] or tokens[0]’s
text) using the project’s Token accessors so the test fails on unintended lexer
changes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8e37b4dd-450a-4b07-9193-4c6e150071e2

📥 Commits

Reviewing files that changed from the base of the PR and between 078c09b and 4093341.

📒 Files selected for processing (2)

CMakeLists.txt
tests/lexer_tests.cpp

- Fix tokenize helper to correctly handle End token - Fix KeywordsVariety ASSERT_GE from 12 to 15 for safe indexing - Fix StringWithEscapedQuote with precise token assertions

style: automated clang-format fixes

4093341

coderabbitai Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread tests/lexer_tests.cpp

Comment thread tests/lexer_tests.cpp Outdated

fix(lexer_tests): address CodeRabbit review comments

36b2833

- Fix tokenize helper to correctly handle End token - Fix KeywordsVariety ASSERT_GE from 12 to 15 for safe indexing - Fix StringWithEscapedQuote with precise token assertions

poyrazK merged commit eb8bf5d into main Apr 14, 2026
9 checks passed

coderabbitai Bot mentioned this pull request May 4, 2026

Improve branch coverage for lexer, expression, recovery modules #73

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add lexer unit tests for parser/lexer.cpp#31

test: add lexer unit tests for parser/lexer.cpp#31
poyrazK merged 3 commits intomainfrom
test/raft-group-coverage

poyrazK commented Apr 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

poyrazK commented Apr 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

poyrazK commented Apr 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading