Add comprehensive test suite, CI workflow, and .gitignore#2
Add comprehensive test suite, CI workflow, and .gitignore#2SuperInstance wants to merge 2 commits intomainfrom
Conversation
- 61 pytest tests covering all module components: - score_handoff(): all 7 scoring categories, thresholds, caps, edge cases - generate_autobiography(): single/multiple handoffs, section extraction, missing data - Baton.__init__(): defaults, keeper URL, credentials, repo resolution - Baton.restore(): fresh/invalid/full baton, all file types, JSON error handling - Baton.snapshot(): quality gate pass/fail, force bypass, generation tracking, file writes - Baton.write_handoff(): template generation, open threads, task counts - Baton.print_restore_summary(): fresh and restored agent display - Baton.acquire_lease(): success/failure - Baton._keeper(): error handling - GitHub Actions CI with Python 3.10, 3.11, 3.12 matrix - Standard Python .gitignore
7378a85 to
35e6c40
Compare
|
Closing: superseded by merged work on main. The changes from this PR have been incorporated through other merged PRs. Thank you for the contribution! 🙏 |
| "energy_remaining": 500, | ||
| }) | ||
| # GENERATION should be the last file written | ||
| assert b.write_log[-1]["path"] == ".baton/GENERATION" |
There was a problem hiding this comment.
🔴 Test test_snapshot_writes_generation_last will always fail because HANDOFF_METRICS.json is written after GENERATION
The test asserts b.write_log[-1]["path"] == ".baton/GENERATION", but the production code in flux_baton.py:1303 writes .baton/HANDOFF_METRICS.json after .baton/GENERATION (line 1291). This means the last entry in write_log will be ".baton/HANDOFF_METRICS.json", not ".baton/GENERATION", and the assertion will always fail.
This also exposes a pre-existing production bug: the docstring at flux_baton.py:1131 states "Writes atomically — GENERATION last" and the comment at flux_baton.py:1290 says "COMMIT MARKER — written LAST", but the v3 addition of HANDOFF_METRICS.json at flux_baton.py:1294-1305 violates this invariant. If the process crashes between writing GENERATION and HANDOFF_METRICS, the next-generation agent will see a new generation number but incomplete metrics data.
Prompt for agents
There are two issues to fix:
1. In flux_baton.py snapshot() method: The HANDOFF_METRICS.json write (lines 1294-1305) happens AFTER the GENERATION commit marker write (lines 1290-1292). This violates the documented atomic commit invariant that GENERATION must be written last. Move the HANDOFF_METRICS.json write to before the GENERATION write.
2. Alternatively, if the production code is fixed, the test assertion b.write_log[-1]["path"] == ".baton/GENERATION" will then pass correctly. If you don't want to fix the production code in this PR, update the test to match the actual write order (e.g. check that GENERATION is second-to-last, or find its index and verify nothing critical comes after it).
Was this helpful? React with 👍 or 👎 to provide feedback.
What
score_handoff()— all 7 scoring categories (surplus_insight, causal_chain, honesty, actionable_signal, compression, human_compat, precedent_value), pass/fail thresholds, score caps, empty text, compression word count rangesgenerate_autobiography()— empty/single/multiple handoffs, section extraction (Where Things Stand, What I Was Thinking), missing generation/score defaultsBatonclass — init defaults, keeper URL handling, repo resolution, restore (fresh/invalid/full baton, all file types, JSON errors), snapshot (quality gate pass/fail, force bypass, generation increment, expected file writes), write_handoff (template, open threads, tasks), print_restore_summary, acquire_leaseWhy
The repo had no tests, no CI, and no .gitignore. This brings it to production-ready standards with full mock-based testing of the keeper integration.
Test results