Context
Temporal reasoning is the hardest unsolved problem in AI memory systems. Benchmarks show a 73% gap between systems and humans on temporal queries (LoCoMo). Systems with explicit temporal modeling score 20-60 points higher on temporal questions.
Current state
db0 tracks:
createdAt — when the memory was stored
- Recency decay — exponential scoring based on age
- Superseding —
validTo implicit via status change
This handles "what's recent" but not "when did this happen" vs "when did I learn about this."
The gap
Zep's bi-temporal model tracks four timestamps per fact:
valid_at — when the fact became true in the world
invalid_at — when it was superseded
created_at — when the system ingested it
expired_at — when the record was logically replaced
This enables:
- Point-in-time queries: "What did we know about X on date Y?"
- Historical reasoning: "When did the user's preference change?"
- Temporal ordering: "Did this happen before or after that?"
Hindsight stores dual timestamps (occurrence time + mention time), enabling both "what happened when" and "what did I learn when" queries. This contributed to a 60-point gain on temporal questions.
Proposed approach
- Add optional
validAt and invalidAt fields to MemoryEntry
- Extract temporal references during
context().ingest() (date patterns in content)
- Support time-range queries in
memory().search() using these fields
- Profile-configurable: knowledge-base and agent-context profiles would enable temporal extraction; conversational and minimal would skip it
Impact
Temporal reasoning is the #1 performance gap across all benchmarks. Even basic bi-temporal modeling would improve scores on temporal queries significantly.
References
Context
Temporal reasoning is the hardest unsolved problem in AI memory systems. Benchmarks show a 73% gap between systems and humans on temporal queries (LoCoMo). Systems with explicit temporal modeling score 20-60 points higher on temporal questions.
Current state
db0 tracks:
createdAt— when the memory was storedvalidToimplicit via status changeThis handles "what's recent" but not "when did this happen" vs "when did I learn about this."
The gap
Zep's bi-temporal model tracks four timestamps per fact:
valid_at— when the fact became true in the worldinvalid_at— when it was supersededcreated_at— when the system ingested itexpired_at— when the record was logically replacedThis enables:
Hindsight stores dual timestamps (occurrence time + mention time), enabling both "what happened when" and "what did I learn when" queries. This contributed to a 60-point gain on temporal questions.
Proposed approach
validAtandinvalidAtfields toMemoryEntrycontext().ingest()(date patterns in content)memory().search()using these fieldsImpact
Temporal reasoning is the #1 performance gap across all benchmarks. Even basic bi-temporal modeling would improve scores on temporal queries significantly.
References