Code Review Bench PR #14587 - Refactor some of ASM and slot-stats functions#4
Code Review Bench PR #14587 - Refactor some of ASM and slot-stats functions#4ketkarameya wants to merge 3 commits intobase_pr_14587_20260125_4187from
Conversation
|
Note Gitar has paused automatic reviews. Comment "Gitar review" to trigger a review manually. Code Review 🚫 Blocked 0 resolved / 2 findingsCritical data loss bug: the two-pass split of slot verification deletes keys in unassigned slots before claiming them, silently destroying data that the old single-pass code preserved. Also, 🚨 Bug: Two-pass refactor causes data loss for keys in unassigned slotsThe original Suggested fix
|
| Auto-apply | Compact |
|
|
Was this helpful? React with 👍 / 👎 | Gitar
| /* Delete keys in unowned slots */ | ||
| clusterDeleteKeysInUnownedSlots(); | ||
| /* Take over slots that we have keys for, but are assigned to no one. */ | ||
| clusterClaimUnassignedSlots(); |
There was a problem hiding this comment.
🚨 Bug: Two-pass refactor causes data loss for keys in unassigned slots
The original verifyClusterConfigWithData() used a single pass through all slots. For each slot with keys that wasn't owned by this node and wasn't being imported, it checked:\n- If unassigned (NULL owner) → claim the slot (no data loss)\n- If assigned to another node → delete the keys\n\nThe refactored code splits this into two sequential passes:\n1. clusterDeleteKeysInUnownedSlots() — deletes keys in slots we don't own\n2. clusterClaimUnassignedSlots() — claims unassigned slots that have keys\n\nThe problem: clusterIsMySlot(i) returns false for unassigned slots (since getNodeBySlot() returns NULL ≠ myself). So in step 1, keys in unassigned slots are deleted. Then in step 2, countKeysInSlot(i) returns 0 for those same slots (keys were just deleted), so they are skipped and never claimed.\n\nResult: keys in unassigned slots are silently destroyed instead of being preserved by claiming the slot. This is a data loss regression.\n\nFix: Reverse the call order — call clusterClaimUnassignedSlots() before clusterDeleteKeysInUnownedSlots(), so unassigned slots are claimed first (preserving keys), and only then are keys in slots owned by other nodes deleted.
Was this helpful? React with 👍 / 👎
| /* Delete keys in unowned slots */ | |
| clusterDeleteKeysInUnownedSlots(); | |
| /* Take over slots that we have keys for, but are assigned to no one. */ | |
| clusterClaimUnassignedSlots(); | |
| /* Take over slots that we have keys for, but are assigned to no one. */ | |
| clusterClaimUnassignedSlots(); | |
| /* Delete keys in unowned slots */ | |
| clusterDeleteKeysInUnownedSlots(); |
- Apply suggested fix
| /* This function is called at server startup in order to initialize cluster data | ||
| * structures that are shared between the different cluster implementations. */ | ||
| void clusterCommonInit(void) { | ||
| server.cluster_slot_stats = malloc(CLUSTER_SLOTS*sizeof(clusterSlotStat)); |
There was a problem hiding this comment.
⚠️ Bug: Use of raw malloc instead of zmalloc breaks memory tracking
The original code in server.c:main() used zmalloc() to allocate cluster_slot_stats. The new clusterCommonInit() uses raw malloc() instead. This means:\n\n1. The allocation is invisible to Redis's INFO memory reporting and used_memory tracking\n2. Memory fragmentation metrics will be inaccurate\n3. If Redis is configured with maxmemory, this allocation won't count toward the limit\n4. It's inconsistent with every other allocation in cluster.c (all use zmalloc)\n\nThe PR context notes mention "Use malloc instead of zmalloc for allocation (consistent with pattern)" but this appears incorrect — no other allocation in cluster.c uses raw malloc.\n\nFix: Change malloc to zmalloc to match the original code and Redis conventions.
Was this helpful? React with 👍 / 👎
| server.cluster_slot_stats = malloc(CLUSTER_SLOTS*sizeof(clusterSlotStat)); | |
| server.cluster_slot_stats = zmalloc(CLUSTER_SLOTS*sizeof(clusterSlotStat)); |
- Apply suggested fix
There was a problem hiding this comment.
Configure merge blocking · Maintainers can dismiss this review. Gitar never approves changes.
Benchmark PR for Gitar evaluation
Original PR: agentic-review-benchmarks#4
Refactor some of ASM and slot-stats functions
Summary by Gitar
clusterCommonInit()function; movedverifyClusterConfigWithData()fromcluster_legacy.cto sharedcluster.cclusterClaimUnassignedSlots()andclusterDeleteKeysInUnownedSlots()for independent reuse across cluster implementationsserver.cto integrate with generic cluster lifecycle rather than legacy-specific codeThis will update automatically on new commits.