fix(heap_table): TupleId sentinel bug and 28 unit tests#29
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 2 minutes and 48 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR extends heap table storage with MVCC support and improves error handling in the buffer pool manager. Changes include updating the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@include/storage/heap_table.hpp`:
- Around line 40-44: The header changed the null sentinel for TupleId to use
page_num == UINT32_MAX, so replace all hard-coded invalid RIDs written as {0, 0}
with TupleId{} (default ctor) to preserve the null semantics; specifically
update the initialization of Iterator::last_id_ (seed value) and the code path
that returns an invalid RID on insert failure to return TupleId{} instead of
{0,0}, and grep for any other occurrences of literal {0,0} that represent “no
tuple” and fix them similarly.
In `@src/storage/buffer_pool_manager.cpp`:
- Around line 108-109: The frame is being made evictable via
replacer_.unpin(frame_id) and then immediately pushed into free_list_ causing
duplication; modify the logic in the function containing
replacer_.unpin(frame_id); free_list_.push_back(frame_id) so a frame is only
returned from one source: either remove the replacer_.unpin call when you intend
to move the frame to free_list_ (so only free_list_ holds it), or instead
check/avoid pushing frames already managed by the replacer (use the same
ownership path as unpin_page_by_id()). Ensure you update the code paths that
manipulate frame_id to use replacer_.unpin exclusively or free_list_.push_back
exclusively to prevent the frame being allocatable from two places.
In `@src/storage/heap_table.cpp`:
- Around line 546-550: The code in get_meta() is making get_meta() hide
logically-deleted tuples by checking out_meta.xmax and returning false, which
breaks callers that expect raw metadata (e.g., executor::operator uses
get_meta() and applies snapshot rules). Revert this: remove the deletion-check
early return in the get_meta() path (and keep the page unpin semantics correct)
so get_meta() continues to return raw meta regardless of xmax; then move the
deleted-tuple filtering into the visibility-aware read path (either
heap_table::get() or a new helper like is_visible_under_snapshot()) and update
callers (e.g., operator.cpp usage) to call that visibility helper or let get()
enforce plain-read hiding if desired.
- Around line 546-549: The fast-path early return unpins the page via
bpm_.unpin_page_by_id(file_id_, tuple_id.page_num, false) even when the page is
stored in cached_page_, leaving cached_page_ as a dangling/stale owned pointer;
change the logic in the block that checks out_meta.xmax != 0 to avoid unpinning
a cached_page_ (either skip bpm_.unpin_page_by_id when cached_page_ is set or
first clear/reset cached_page_ before unpinning) so cached_page_ no longer holds
a stale reference — update the code around cached_page_, bpm_.unpin_page_by_id,
file_id_, and tuple_id.page_num accordingly.
In `@tests/heap_table_tests.cpp`:
- Around line 345-366: The test reads HeapTable::TupleMeta fields after calling
table_->get_meta(rid, meta) without asserting the call succeeded; update the
tests (e.g., in TEST_F(HeapTableTests, MVCCUpdateTransactionId)) to explicitly
check the return of table_->get_meta(rid, meta) with
ASSERT_TRUE(table_->get_meta(rid, meta)) (or EXPECT_FALSE where appropriate)
before accessing meta.xmin/meta.xmax so a failing get_meta() causes the test to
fail immediately; locate usages of table_->get_meta and add the assertion
immediately after each call that is followed by direct reads of meta.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 77dae9df-6913-4817-a122-79051e3bdd1f
📒 Files selected for processing (6)
.gitignoreCMakeLists.txtinclude/storage/heap_table.hppsrc/storage/buffer_pool_manager.cppsrc/storage/heap_table.cpptests/heap_table_tests.cpp
| replacer_.unpin(frame_id); | ||
| free_list_.push_back(frame_id); |
There was a problem hiding this comment.
Don't put the same frame in both free_list_ and the replacer.
unpin_page_by_id() already uses replacer_.unpin() to make a zero-pin frame evictable again at Lines 146-149. Doing that here and then pushing the same frame_id into free_list_ makes the failed frame allocatable from two places, which can hand out the same frame twice after a read miss.
Suggested fix
- replacer_.unpin(frame_id);
free_list_.push_back(frame_id);
page_table_.erase(key);
page->file_name_.clear();
page->page_id_ = 0;
page->pin_count_ = 0;
+ page->is_dirty_ = false;
return nullptr;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/storage/buffer_pool_manager.cpp` around lines 108 - 109, The frame is
being made evictable via replacer_.unpin(frame_id) and then immediately pushed
into free_list_ causing duplication; modify the logic in the function containing
replacer_.unpin(frame_id); free_list_.push_back(frame_id) so a frame is only
returned from one source: either remove the replacer_.unpin call when you intend
to move the frame to free_list_ (so only free_list_ holds it), or instead
check/avoid pushing frames already managed by the replacer (use the same
ownership path as unpin_page_by_id()). Ensure you update the code paths that
manipulate frame_id to use replacer_.unpin exclusively or free_list_.push_back
exclusively to prevent the frame being allocatable from two places.
| /* Skip tuples that have been logically deleted */ | ||
| if (out_meta.xmax != 0) { | ||
| bpm_.unpin_page_by_id(file_id_, tuple_id.page_num, false); | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Keep get_meta() as raw metadata access.
src/executor/operator.cpp Lines 200-228 still calls get_meta() and then applies snapshot visibility rules using meta.xmax. Returning false here makes deleted-but-still-visible versions disappear for older snapshots, and because the fallback path at Lines 598-677 does not do the same check, the result currently depends on whether the page happens to be cached. If plain reads should hide deleted tuples, move that policy into get() or a separate visibility-aware helper.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/storage/heap_table.cpp` around lines 546 - 550, The code in get_meta() is
making get_meta() hide logically-deleted tuples by checking out_meta.xmax and
returning false, which breaks callers that expect raw metadata (e.g.,
executor::operator uses get_meta() and applies snapshot rules). Revert this:
remove the deletion-check early return in the get_meta() path (and keep the page
unpin semantics correct) so get_meta() continues to return raw meta regardless
of xmax; then move the deleted-tuple filtering into the visibility-aware read
path (either heap_table::get() or a new helper like is_visible_under_snapshot())
and update callers (e.g., operator.cpp usage) to call that visibility helper or
let get() enforce plain-read hiding if desired.
| /* Skip tuples that have been logically deleted */ | ||
| if (out_meta.xmax != 0) { | ||
| bpm_.unpin_page_by_id(file_id_, tuple_id.page_num, false); | ||
| return false; |
There was a problem hiding this comment.
Do not unpin cached_page_ from this fast path.
On the cached-page branch, cached_page_ stays stored on the HeapTable after this return. Dropping the pin here without clearing that cache leaves a stale owned pointer behind, so later inserts or destruction can unpin/reuse a frame the table no longer owns.
Suggested fix
/* Skip tuples that have been logically deleted */
if (out_meta.xmax != 0) {
- bpm_.unpin_page_by_id(file_id_, tuple_id.page_num, false);
return false;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /* Skip tuples that have been logically deleted */ | |
| if (out_meta.xmax != 0) { | |
| bpm_.unpin_page_by_id(file_id_, tuple_id.page_num, false); | |
| return false; | |
| /* Skip tuples that have been logically deleted */ | |
| if (out_meta.xmax != 0) { | |
| return false; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/storage/heap_table.cpp` around lines 546 - 549, The fast-path early
return unpins the page via bpm_.unpin_page_by_id(file_id_, tuple_id.page_num,
false) even when the page is stored in cached_page_, leaving cached_page_ as a
dangling/stale owned pointer; change the logic in the block that checks
out_meta.xmax != 0 to avoid unpinning a cached_page_ (either skip
bpm_.unpin_page_by_id when cached_page_ is set or first clear/reset cached_page_
before unpinning) so cached_page_ no longer holds a stale reference — update the
code around cached_page_, bpm_.unpin_page_by_id, file_id_, and tuple_id.page_num
accordingly.
| HeapTable::TupleMeta meta; | ||
| table_->get_meta(rid, meta); | ||
| EXPECT_EQ(meta.xmin, 100U); | ||
| EXPECT_EQ(meta.xmax, 0U); | ||
|
|
||
| table_->remove(rid, 200); // xmax = 200 | ||
| table_->get_meta(rid, meta); | ||
| EXPECT_EQ(meta.xmax, 200U); | ||
| } | ||
|
|
||
| TEST_F(HeapTableTests, MVCCUpdateTransactionId) { | ||
| ASSERT_TRUE(table_->create()); | ||
| auto tuple = make_test_tuple(1, "Original"); | ||
| auto rid = table_->insert(tuple, 1); | ||
|
|
||
| auto new_tuple = make_test_tuple(1, "Updated"); | ||
| table_->update(rid, new_tuple, 99); | ||
|
|
||
| // xmax should be set for old version, or new tuple has xmin=99 | ||
| HeapTable::TupleMeta meta; | ||
| table_->get_meta(rid, meta); | ||
| EXPECT_TRUE(meta.xmin == 99 || meta.xmax == 99); |
There was a problem hiding this comment.
Assert the get_meta() result before reading meta.
These MVCC checks inspect meta after a second get_meta() call without first asserting whether that call succeeded. If get_meta() regresses and returns false, the test will read stale values from the previous successful call instead of failing at the real source. Please make the intended contract explicit with ASSERT_TRUE(...) or EXPECT_FALSE(...) before the field assertions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/heap_table_tests.cpp` around lines 345 - 366, The test reads
HeapTable::TupleMeta fields after calling table_->get_meta(rid, meta) without
asserting the call succeeded; update the tests (e.g., in TEST_F(HeapTableTests,
MVCCUpdateTransactionId)) to explicitly check the return of
table_->get_meta(rid, meta) with ASSERT_TRUE(table_->get_meta(rid, meta)) (or
EXPECT_FALSE where appropriate) before accessing meta.xmin/meta.xmax so a
failing get_meta() causes the test to fail immediately; locate usages of
table_->get_meta and add the assertion immediately after each call that is
followed by direct reads of meta.
- Fix TupleId is_null() treating valid (0,0) as null (sentinel was 0, fixed to UINT32_MAX) - Fix get_meta() not checking xmax, returning logically deleted tuples - Fix fetch_page_by_id() returning zeroed page beyond EOF (infinite loop bug) - Add 28 unit tests for HeapTable covering insert, update, delete, scan, MVCC - Ignore _deps/ build artifact directory
2fd70a4 to
f579051
Compare
- Fix TupleId is_null() treating valid (0,0) as null (sentinel was 0, fixed to UINT32_MAX) - Fix get_meta() not checking xmax, returning logically deleted tuples - Fix default struct initialization for thread sanitizer compatibility - Add 28 unit tests for HeapTable covering insert, update, delete, scan, MVCC - Ignore _deps/ build artifact directory
6cf1ce8 to
0f006ed
Compare
Summary
Test plan
Summary by CodeRabbit
Bug Fixes
Tests