Skip to content

feat: tool_call_alias and depends_on for intra-turn tool scheduling#50

Open
Simon-Free wants to merge 14 commits intoSafeRL-Lab:mainfrom
Simon-Free:pr4-tool-scheduling-depends-on
Open

feat: tool_call_alias and depends_on for intra-turn tool scheduling#50
Simon-Free wants to merge 14 commits intoSafeRL-Lab:mainfrom
Simon-Free:pr4-tool-scheduling-depends-on

Conversation

@Simon-Free
Copy link
Copy Markdown

@Simon-Free Simon-Free commented Apr 17, 2026

Summary

Expose two scheduling hints to the LLM on every tool schema, and strip them before they reach tool handlers:

  • tool_call_alias: string - optional alias the model can use to refer to a tool call by a short name later in the same turn.
  • depends_on: string[] - list of prior tool_call_ids or aliases. The model uses this to express sequential dependencies between tools that it wants executed one after the other rather than in parallel.

Also coerces string-typed params (sent by LLMs that flatten JSON) into their schema-declared types: "42" ? 42 for an integer property, "true" ? True for boolean, '[{...}]' ? list/dict for array/object.

What's in scope here

This PR only injects the schema fields and performs param coercion + stripping. Runtime enforcement of depends_on ordering in the agent loop is not yet implemented - the model gets a hint and can call tools in order manually, but the registry does not re-order parallel executions based on depends_on. Deferred to a follow-up PR so this one stays small and review-friendly.

Changes

File +/- What
tool_registry.py +86 _SCHEDULING_PROPS constant, _coerce_params split into per-type coercers dispatched through _COERCERS, wrapper over get_tool_schemas that injects scheduling props, wrapper over execute_tool that strips scheduling props and coerces types
tests/test_tool_scheduling.py +104 Unit coverage of schema injection, coercion per type, pass-through on invalid input, and stripping at dispatch
tests/test_tool_scheduling_e2e.py +96 Two e2e scenarios via agent.run + mocked stream: (1) every schema the LLM sees carries the scheduling props, (2) when the LLM emits a tool_call that includes tool_call_alias + depends_on, those are gone by the time the tool handler runs

Cleanups folded in

  • _coerce_params' silent except (ValueError, json.JSONDecodeError): pass replaced by a dispatch table where each coercer explicitly returns the original value on failure, with a comment explaining the intent ("tool handler reports the real type mismatch").
  • Fix test_scheduling_params_stripped which called execute_tool(name, params) without the required config arg - it was failing since the branch landed.

Ref #43

@Simon-Free Simon-Free marked this pull request as draft April 17, 2026 19:45
@chauncygu chauncygu marked this pull request as ready for review April 18, 2026 07:51
bot and others added 4 commits April 21, 2026 11:03
The three TestTokenSnapshotExtendedFields cases asserted cache_read /
cache_creation fields that were removed in 620bbb2 ("fix: remove dead
cache_read/cache_creation fields per review"). They have been failing
ever since. Delete test_checkpoint_extras.py -- its remaining cases were
either trivial (test_store_imports_sys checks 'import sys' exists) or
file-source text scans (TestCheckpointPrintsToStderr) which don't test
user behavior.

Add tests/test_checkpoint_e2e.py with two real e2e scenarios:
- Drive agent.run with a mocked LLM that emits a Write tool_call; assert
  the checkpoint hook created a pre-edit backup of the original content.
- Same path but the file exceeds _MAX_FILE_SIZE -- assert the skip
  message lands on stderr only, not stdout. This is the actual
  user-visible contract of PR SafeRL-Lab#47 and covers the full wiring
  agent.run -> Write hook -> checkpoint.store.track_file_edit.

The three behavior tests in test_checkpoint_store.py stay -- they cover
the store function directly via capsys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Simon-Free
Copy link
Copy Markdown
Author

Changes in this updatenn### Removed: type coercion (split to separate PR)nCoercion is an independent concern -- moved to its own PR with bug fixes.nn### Added: ID uniqueness enforcementnPorted id_uniquify.py from bouzecode. Without this, when the LLM reuses IDs like r1 across turns, you get duplicate tool_call_id errors from the API.nn### Fixed: input_schema-style schema injectionnScheduling props (tool_call_alias, depends_on) were injected at the wrong level for Anthropic-style schemas. Now correctly targets input_schema.properties.nn### Testsn- 7 new ID uniquify unit testsn- 1 new e2e test (ID reuse across turns)n- Removed coercion tests (moved to coercion PR)

@Simon-Free
Copy link
Copy Markdown
Author

Changes in this update

Removed: type coercion (split to separate PR #62)

Coercion is an independent concern - moved to its own PR with bug fixes.

Added: ID uniqueness enforcement

Ported id_uniquify.py from bouzecode. Without this, when the LLM reuses IDs like r1 across turns, you get duplicate tool_call_id errors from the API.

Fixed: input_schema-style schema injection

Scheduling props were injected at the wrong level for Anthropic-style schemas. Now correctly targets input_schema.properties.

Tests

@Simon-Free Simon-Free force-pushed the pr4-tool-scheduling-depends-on branch 3 times, most recently from e6db1fc to e9a96f7 Compare April 21, 2026 12:45
Simon FREYBURGER and others added 9 commits April 21, 2026 14:46
Split _coerce_params (20 lines, nested try/except chain) into:
- a small orchestrator that walks params and delegates,
- four single-purpose coercers (_coerce_int / _coerce_float /
  _coerce_bool / _coerce_json) dispatched through a _COERCERS map.

Each catching coercer still returns the original string on failure -- but
the intent is now explicit via a comment ("tool handler reports the real
type mismatch"), and the bare `except: pass` silent-pass pattern is gone.

Also fix test_scheduling_params_stripped which called execute_tool without
the required config arg; it has been failing since the pr4 branch landed.

Add tests/test_tool_scheduling_e2e.py that drives agent.run with a
mocked LLM:
- assert every schema sent to the stream carries tool_call_alias +
  depends_on (proof the schema injection path is wired through the full
  agent loop, not just a unit helper);
- register a "receiver" tool, let the LLM emit a tool_call with
  scheduling params + one real param, assert the scheduling params are
  gone and the real param reaches the handler.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Simon-Free Simon-Free force-pushed the pr4-tool-scheduling-depends-on branch from e9a96f7 to 805b024 Compare April 21, 2026 12:54
@Simon-Free
Copy link
Copy Markdown
Author

Actually depends on #47 (needs conftest.py)

@Simon-Free
Copy link
Copy Markdown
Author

Dependency: This PR depends on #47 (pr3-checkpoint-stderr-tokens). Please merge #47 first.nnThe tests/conftest.py shared fixtures (_no_quota) are introduced in #47. This PR extends conftest with scripted_stream + receiver_tool for scheduling-specific e2e tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant