test: add lexer unit tests for parser/lexer.cpp#31
Conversation
- Add 54 unit tests covering Lexer class: - Basic tokenization (SELECT, identifiers, numbers, strings) - Keywords case insensitivity - Operators (=, <>, <, <=, >, >=, +, -, *, /) - Delimiters (parentheses, comma, semicolon, dot) - Whitespace and newline handling - SQL comments (-- style) - Position tracking (line/column) - peek_token() behavior - Error handling (unterminated strings, invalid chars) - Complex SQL statements (CREATE TABLE, INSERT, UPDATE, DELETE, JOIN)
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 19 minutes and 49 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughThis PR adds comprehensive unit test coverage for the SQL lexer component. A new test executable Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/lexer_tests.cpp (1)
176-182:StringWithEscapedQuotecurrently has no behavioral assertion.This test is too permissive and can pass even with unintended lexer behavior. Please assert a concrete expected contract (current or target behavior) so it actually protects against regressions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/lexer_tests.cpp` around lines 176 - 182, The test StringWithEscapedQuote is too vague—update it to assert a concrete expected contract using tokenize and the tokens vector: decide whether the lexer should treat "'it''s cool'" as a single STRING token or as two separate STRING tokens (current behavior seems to produce two), then replace ASSERT_GE(tokens.size(), 2) with precise checks (e.g., ASSERT_EQ(tokens.size(), N) and assert the token types/values for tokens[0] and tokens[1] or tokens[0]’s text) using the project’s Token accessors so the test fails on unintended lexer changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/lexer_tests.cpp`:
- Around line 87-103: The size precondition is too small and can allow
out-of-bounds indexed reads on tokens (the test reads tokens[0]..tokens[14]);
update the assertion in tests/lexer_tests.cpp from ASSERT_GE(tokens.size(), 12)
to assert at least 15 elements (e.g., ASSERT_GE(tokens.size(), 15)) so all
subsequent EXPECT_EQ checks against TokenType (Select, Distinct, Identifier,
..., Asc) are safe.
- Around line 26-33: The tokenize helper can append the End token twice because
it uses while(!lexer.is_at_end()) to push tokens and then calls
lexer.next_token() again; change tokenize so it repeatedly calls
Lexer::next_token() and pushes the returned Token until a Token with
TokenType::End is produced (then stop), or alternatively rely solely on
is_at_end() without the extra lexer.next_token() call—ensure you reference
tokenize, Lexer::next_token, Lexer::is_at_end, and TokenType::End when making
the change.
---
Nitpick comments:
In `@tests/lexer_tests.cpp`:
- Around line 176-182: The test StringWithEscapedQuote is too vague—update it to
assert a concrete expected contract using tokenize and the tokens vector: decide
whether the lexer should treat "'it''s cool'" as a single STRING token or as two
separate STRING tokens (current behavior seems to produce two), then replace
ASSERT_GE(tokens.size(), 2) with precise checks (e.g., ASSERT_EQ(tokens.size(),
N) and assert the token types/values for tokens[0] and tokens[1] or tokens[0]’s
text) using the project’s Token accessors so the test fails on unintended lexer
changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8e37b4dd-450a-4b07-9193-4c6e150071e2
📒 Files selected for processing (2)
CMakeLists.txttests/lexer_tests.cpp
- Fix tokenize helper to correctly handle End token - Fix KeywordsVariety ASSERT_GE from 12 to 15 for safe indexing - Fix StringWithEscapedQuote with precise token assertions
Summary
Add 54 unit tests covering Lexer class in parser/lexer.cpp:
Test plan
Summary by CodeRabbit