Background
The DynamoDB session metadata schema encodes lastMessageAt in the sort key (S#ACTIVE#{lastMessageAt}#{session_id}). This made recency-ordered listing a single Query, but every turn that updates the row requires put-new + delete-old (an SK move).
Combined with multiple writers touching the same row near the end of a stream — _update_session_metadata (full-row merge), update_session_title (targeted), add_pending_interrupt (read-modify-write) — this produces structural write amplification and forces every per-row update to first do a GSI lookup to find the current SK.
Proposed change
Stable SK with recency on a new GSI:
- Base table SK:
S#{session_id} (no timestamp). Owner queries become direct GetItem.
- New GSI:
(USER#{user_id}, lastMessageAt) for recency listing, optionally filtered by a status attribute.
- All per-turn writers convert to targeted
UpdateExpressions with no read-before-write.
What this fixes
- Eliminates put + delete dance on every turn → one
UpdateExpression per writer.
- Removes the GSI-then-write pattern in
update_session_title and add_pending_interrupt.
pendingInterrupts mutation becomes list_append/REMOVE instead of read-modify-write.
- Soft-delete becomes a flag flip, not an SK rewrite.
- Removes the existing
SessionLookupIndex dependency for owner queries (it can stay for cross-user/admin lookup, or be dropped).
Scope
- CDK: add new recency GSI to the session metadata table; optionally drop
SessionLookupIndex if no remaining callers need pure-session-id lookup.
_store_session_metadata_cloud — collapse SK-move branch; single update_item.
_list_user_sessions_cloud — query the new GSI instead of base table SK begins-with.
get_session_metadata — GetItem on stable SK when user_id is known.
_update_session_metadata (in stream_coordinator.py) — fully targeted update.
Cost (C#) and display-text (D#) records are keyed independently and unaffected.
Migration
We're in beta — accepting breaking changes for existing conversations. Cleanest path: drop and recreate the table on redeploy, or run a one-time scan/wipe of S#ACTIVE#* and S#DELETED#* rows (leave C#* and D#* intact for cost/audit history).
Out of scope
- Splitting the row into hot/cold records — considered, not needed once writers are attribute-disjoint.
- POST /sessions endpoint to create the row outside the streaming path — separate concern, can be revisited later.
Related
The immediate race conditions in add_pending_interrupt and _update_session_metadata are being fixed in feature/connectors via targeted UpdateExpressions on the current schema. That fix is independent of this refactor and unblocks the connectors OAuth work; this issue tracks the broader architectural cleanup.
Background
The DynamoDB session metadata schema encodes
lastMessageAtin the sort key (S#ACTIVE#{lastMessageAt}#{session_id}). This made recency-ordered listing a singleQuery, but every turn that updates the row requires put-new + delete-old (an SK move).Combined with multiple writers touching the same row near the end of a stream —
_update_session_metadata(full-row merge),update_session_title(targeted),add_pending_interrupt(read-modify-write) — this produces structural write amplification and forces every per-row update to first do a GSI lookup to find the current SK.Proposed change
Stable SK with recency on a new GSI:
S#{session_id}(no timestamp). Owner queries become directGetItem.(USER#{user_id}, lastMessageAt)for recency listing, optionally filtered by astatusattribute.UpdateExpressions with no read-before-write.What this fixes
UpdateExpressionper writer.update_session_titleandadd_pending_interrupt.pendingInterruptsmutation becomeslist_append/REMOVEinstead of read-modify-write.SessionLookupIndexdependency for owner queries (it can stay for cross-user/admin lookup, or be dropped).Scope
SessionLookupIndexif no remaining callers need pure-session-id lookup._store_session_metadata_cloud— collapse SK-move branch; singleupdate_item._list_user_sessions_cloud— query the new GSI instead of base table SK begins-with.get_session_metadata—GetItemon stable SK when user_id is known._update_session_metadata(instream_coordinator.py) — fully targeted update.Cost (
C#) and display-text (D#) records are keyed independently and unaffected.Migration
We're in beta — accepting breaking changes for existing conversations. Cleanest path: drop and recreate the table on redeploy, or run a one-time scan/wipe of
S#ACTIVE#*andS#DELETED#*rows (leaveC#*andD#*intact for cost/audit history).Out of scope
Related
The immediate race conditions in
add_pending_interruptand_update_session_metadataare being fixed infeature/connectorsvia targetedUpdateExpressions on the current schema. That fix is independent of this refactor and unblocks the connectors OAuth work; this issue tracks the broader architectural cleanup.