fix: put_index_list de-duplicates when appending to list#691
fix: put_index_list de-duplicates when appending to list#691ok300 wants to merge 1 commit intopubky:mainfrom
put_index_list de-duplicates when appending to list#691Conversation
Modified lists::put to use LREM + RPUSH pattern in a pipeline to prevent duplicate entries when re-indexing operations are performed. For each value being added: 1. LREM removes all existing occurrences from the list (count=0 removes all) 2. RPUSH appends the value to the end of the list This approach: - Prevents duplicates during re-indexing operations - Makes the operation idempotent (calling multiple times = calling once) - Uses atomic pipeline execution (no Lua scripts required) - Preserves append semantics (new values added to end) Note: If a value already exists, it will be removed from its original position and re-added at the end of the list. Files changed: - nexus-common/src/db/kv/index/lists.rs: Updated put() implementation - nexus-common/src/db/kv/traits.rs: Removed TODO warning about unsafe re-indexing
|
|
||
| // Use a pipeline to remove duplicates then append each value | ||
| // LREM removes all existing occurrences (0 = all), then RPUSH appends to end | ||
| // This is atomic from the perspective of other Redis clients |
There was a problem hiding this comment.
That comment is not strictly correct in Redis terms:
- A pipeline sends multiple commands in one batch over the same connection, reducing round trips.
- But it does not make them atomic. Other clients’ commands can still be interleaved between these commands on the server.
To get true atomicity in Redis, you’d need MULTI / EXEC (transaction) or a Lua script (EVAL).
There was a problem hiding this comment.
Two comments from me:
- A much better alternative for list uses that should not have duplicates and where order does not matter might be to use a Redis Set instead. So we might simply identify these use cases (if any) and migrate them to a set?
- I am not 100% sure that we never want duplicates in ANY of our uses of Redis Lists 🤔 Certainly this makes our implementation of "put_to_list" stop behaving as a regular list, so we need to account for that when we want to implement new features that should behave as lists in the future. In fact I see how we are using the List type for
/events, this is a use case where we do expect and want duplicates. This is the only List usage we have I believe.
IMO the TODO was on the wrong side and we probably should not merge this. How about we just remove the TODO? 😄
Codex review:
⚠️ Ordering considerations (behavior change)The new behavior changes ordering for any value that already exists in the list; it gets moved to the tail. That is a meaningful semantic change and should be confirmed as acceptable. If list order is significant (e.g., chronological or priority ordering), this change could be surprising.
⚠️ Partial update visibility / atomicity
LREM+RPUSHin a pipeline is not atomic. Other clients can observe transient states where the item has been removed but not yet re-added. If atomicity matters, this should be done inside a Lua script or a MULTI/EXEC transaction. A pipeline improves throughput, but does not provide isolation.
⚠️ Duplicate handling within a single batchIf the same value appears multiple times in the input (e.g., due to a caller bug), the current described approach will still append duplicates. If true de-duplication is required, the input list should be de-duplicated before issuing Redis commands.
Performance Review
LREMis O(N) per value. If these lists are large or if a large batch is appended, total cost can become O(N*M). Depending on dataset size, this might be acceptable but should be acknowledged.- Pipelining mitigates round-trip cost but doesn’t change algorithmic cost on Redis.
Tests / Validation
No tests were described. This change affects persistence semantics and ordering, so targeted tests would be valuable:
- Re-indexing does not create duplicates.
- Existing values move to the end (if that’s intended).
- No duplicate values remain after a batch insert.
- Concurrency scenario test (if possible), or at least documentation of atomicity limits.
Merge Readiness
Not yet ready. The deduplication behavior is good, but I recommend clarifying and/or addressing the following before merge:
- Confirm that “move-to-end” semantics are desired.
- Address atomicity, or explicitly document that transient removal is acceptable.
- Add tests to lock in the new behavior and prevent regressions.
This PR addresses a TODO in
put_index_listwhich mentioned the method only inserts and doesn't check for duplicates.The fix is a modification to
lists::putto useLREM + RPUSHpattern in a pipeline to prevent duplicate entries.For each value being added:
LREMremoves all existing occurrences from the list (count=0 removes all)RPUSHappends the value to the end of the listNote: If a value already exists, it will be removed from its original position and re-added at the end of the list.