Skip to content

fix: put_index_list de-duplicates when appending to list#691

Draft
ok300 wants to merge 1 commit intopubky:mainfrom
ok300:claude/find-fix-bug-mk9qylaxodkk8gey-U1M8Z
Draft

fix: put_index_list de-duplicates when appending to list#691
ok300 wants to merge 1 commit intopubky:mainfrom
ok300:claude/find-fix-bug-mk9qylaxodkk8gey-U1M8Z

Conversation

@ok300
Copy link
Contributor

@ok300 ok300 commented Jan 16, 2026

This PR addresses a TODO in put_index_list which mentioned the method only inserts and doesn't check for duplicates.

The fix is a modification to lists::put to use LREM + RPUSH pattern in a pipeline to prevent duplicate entries.

For each value being added:

  1. LREM removes all existing occurrences from the list (count=0 removes all)
  2. RPUSH appends the value to the end of the list

Note: If a value already exists, it will be removed from its original position and re-added at the end of the list.

Modified lists::put to use LREM + RPUSH pattern in a pipeline to prevent
duplicate entries when re-indexing operations are performed.

For each value being added:
1. LREM removes all existing occurrences from the list (count=0 removes all)
2. RPUSH appends the value to the end of the list

This approach:
- Prevents duplicates during re-indexing operations
- Makes the operation idempotent (calling multiple times = calling once)
- Uses atomic pipeline execution (no Lua scripts required)
- Preserves append semantics (new values added to end)

Note: If a value already exists, it will be removed from its original position
and re-added at the end of the list.

Files changed:
- nexus-common/src/db/kv/index/lists.rs: Updated put() implementation
- nexus-common/src/db/kv/traits.rs: Removed TODO warning about unsafe re-indexing
@ok300 ok300 requested review from SHAcollision and tipogi January 16, 2026 10:37

// Use a pipeline to remove duplicates then append each value
// LREM removes all existing occurrences (0 = all), then RPUSH appends to end
// This is atomic from the perspective of other Redis clients
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment is not strictly correct in Redis terms:

  • A pipeline sends multiple commands in one batch over the same connection, reducing round trips.
  • But it does not make them atomic. Other clients’ commands can still be interleaved between these commands on the server.

To get true atomicity in Redis, you’d need MULTI / EXEC (transaction) or a Lua script (EVAL).

Copy link
Collaborator

@SHAcollision SHAcollision left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments from me:

  1. A much better alternative for list uses that should not have duplicates and where order does not matter might be to use a Redis Set instead. So we might simply identify these use cases (if any) and migrate them to a set?
  2. I am not 100% sure that we never want duplicates in ANY of our uses of Redis Lists 🤔 Certainly this makes our implementation of "put_to_list" stop behaving as a regular list, so we need to account for that when we want to implement new features that should behave as lists in the future. In fact I see how we are using the List type for /events , this is a use case where we do expect and want duplicates. This is the only List usage we have I believe.
image

IMO the TODO was on the wrong side and we probably should not merge this. How about we just remove the TODO? 😄

Codex review:

⚠️ Ordering considerations (behavior change)

The new behavior changes ordering for any value that already exists in the list; it gets moved to the tail. That is a meaningful semantic change and should be confirmed as acceptable. If list order is significant (e.g., chronological or priority ordering), this change could be surprising.

⚠️ Partial update visibility / atomicity

LREM + RPUSH in a pipeline is not atomic. Other clients can observe transient states where the item has been removed but not yet re-added. If atomicity matters, this should be done inside a Lua script or a MULTI/EXEC transaction. A pipeline improves throughput, but does not provide isolation.

⚠️ Duplicate handling within a single batch

If the same value appears multiple times in the input (e.g., due to a caller bug), the current described approach will still append duplicates. If true de-duplication is required, the input list should be de-duplicated before issuing Redis commands.

Performance Review

  • LREM is O(N) per value. If these lists are large or if a large batch is appended, total cost can become O(N*M). Depending on dataset size, this might be acceptable but should be acknowledged.
  • Pipelining mitigates round-trip cost but doesn’t change algorithmic cost on Redis.

Tests / Validation

No tests were described. This change affects persistence semantics and ordering, so targeted tests would be valuable:

  • Re-indexing does not create duplicates.
  • Existing values move to the end (if that’s intended).
  • No duplicate values remain after a batch insert.
  • Concurrency scenario test (if possible), or at least documentation of atomicity limits.

Merge Readiness

Not yet ready. The deduplication behavior is good, but I recommend clarifying and/or addressing the following before merge:

  1. Confirm that “move-to-end” semantics are desired.
  2. Address atomicity, or explicitly document that transient removal is acceptable.
  3. Add tests to lock in the new behavior and prevent regressions.

@ok300 ok300 marked this pull request as draft January 21, 2026 07:46
@tipogi tipogi added the 📈 enhancement New feature or request label Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

📈 enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants