fix: route checkpoint operational logs to stderr#47
fix: route checkpoint operational logs to stderr#47Simon-Free wants to merge 6 commits intoSafeRL-Lab:mainfrom
Conversation
|
Thanks for the contribution! I'd like to ask for this PR to be split into two, because the two changes have very different risk profiles. A few issues first:
Proposed split:
Regarding the failing CI on 3.10/3.11 — that's a pre-existing infra issue (sqlalchemy isn't in requirements.txt but tests/test_web_api.py imports it). It's unrelated to your changes and is being fixed separately. Once that fix is on main, please click Update branch on this PR (or rebase) so the new ci.yml takes effect. Thanks again! |
|
I am very sorry for description not matching the diffs, I mistakenly made the PR on cheetahclaws instead of my fork, and thus marked as draft, in order to take the time to manually review, can I undraft them back and ping you whenever I finished to check it ? For both this one and the other |
|
Sure, Thanks a lot for your great contributions. |
The three TestTokenSnapshotExtendedFields cases asserted cache_read / cache_creation fields that were removed in 620bbb2 ("fix: remove dead cache_read/cache_creation fields per review"). They have been failing ever since. Delete test_checkpoint_extras.py -- its remaining cases were either trivial (test_store_imports_sys checks 'import sys' exists) or file-source text scans (TestCheckpointPrintsToStderr) which don't test user behavior. Add tests/test_checkpoint_e2e.py with two real e2e scenarios: - Drive agent.run with a mocked LLM that emits a Write tool_call; assert the checkpoint hook created a pre-edit backup of the original content. - Same path but the file exceeds _MAX_FILE_SIZE -- assert the skip message lands on stderr only, not stdout. This is the actual user-visible contract of PR SafeRL-Lab#47 and covers the full wiring agent.run -> Write hook -> checkpoint.store.track_file_edit. The three behavior tests in test_checkpoint_store.py stay -- they cover the store function directly via capsys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The three TestTokenSnapshotExtendedFields cases asserted cache_read / cache_creation fields that were removed in 620bbb2 ("fix: remove dead cache_read/cache_creation fields per review"). They have been failing ever since. Delete test_checkpoint_extras.py -- its remaining cases were either trivial (test_store_imports_sys checks 'import sys' exists) or file-source text scans (TestCheckpointPrintsToStderr) which don't test user behavior. Add tests/test_checkpoint_e2e.py with two real e2e scenarios: - Drive agent.run with a mocked LLM that emits a Write tool_call; assert the checkpoint hook created a pre-edit backup of the original content. - Same path but the file exceeds _MAX_FILE_SIZE -- assert the skip message lands on stderr only, not stdout. This is the actual user-visible contract of PR SafeRL-Lab#47 and covers the full wiring agent.run -> Write hook -> checkpoint.store.track_file_edit. The three behavior tests in test_checkpoint_store.py stay -- they cover the store function directly via capsys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
b9bf441 to
4bc4a09
Compare
|
Hey @chauncygu this one looks better to me, but if you have any improvement ideas feel free to share ! Also, i tried to introduce some "e2e" tests (quotes, because we are still mocking LLM + some other things) that simulate the workflow and showcase the fix/feature in action ! |
Summary
Route the two
[checkpoint]operational prints (large-file skip, backup-failure) tosys.stderrinstead ofstdout. On stdout they pollute the conversation transcript captured by session tooling; on stderr they stay in the operator's view where they belong.Changes
checkpoint/store.pyimport sys+file=sys.stderron the twoprint()calls intrack_file_edittests/test_checkpoint_store.pycapsys: large file skipped (stderr, not stdout), normal file backed up,shutil.copy2failure routed to stderrtests/test_checkpoint_e2e.pyagent.runwith a mocked LLM that issues aWritetool_call; the Write hook, checkpoint store, and stderr routing all run for realWhat was dropped
tests/test_checkpoint_extras.pyis deleted. Three staleTestTokenSnapshotExtendedFieldscases assertedcache_read/cache_creationfields removed in 620bbb2 per earlier reviewer feedback. The remaining two cases were trivial or file-source text scans; neither tests user behavior.Scope limit
Only the stderr-routing change is in this PR. The earlier
stderr_lines/cache_read/cache_creationsnapshot fields mentioned in the previous description are not included: they need producer-side wiring (collecting per-tool stderr into the snapshot, tracking cache tokens from provider responses) which belongs in a follow-up once the plumbing exists.Ref #43