-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
enhancementNew feature or requestNew feature or request
Milestone
Description
context
the roadmap calls for stress testing with large transcripts. correctness under load is the goal — not performance benchmarks.
scope
integration test (gated on env var, uses free/agentic or free/text-generation):
- build a synthetic transcript with 80–100 messages (mix of user, assistant, tool_call+tool_result pairs)
- run rolling compaction repeatedly until message count is stable (≤4 non-system messages)
- after each compaction round, assert invariants:
- no orphaned tool results (every tool message has a corresponding assistant tool_call)
- system messages preserved throughout
- summary grows or stays non-empty
- message count strictly decreases each round (or hits the ≤4 floor)
- transcript anchor written after each round
- after full compaction: verify final context is coherent (loadable, parseable, no corrupt JSON)
notes
- free models may be slow; set generous timeouts
- if LLM returns empty IDs repeatedly (triggering fallback path), that's fine — test the fallback path too
- seed the transcript deterministically so failures are reproducible
- this test will be slow (~minutes); mark clearly in output
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request