add tool-calling example and replay support for tool-iteration loops (#116)#144
Closed
cchinchilla-dev wants to merge 3 commits intomainfrom
Closed
add tool-calling example and replay support for tool-iteration loops (#116)#144cchinchilla-dev wants to merge 3 commits intomainfrom
cchinchilla-dev wants to merge 3 commits intomainfrom
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
Adds first-class, provider-agnostic tool/function calling to llm_call steps (with bounded iteration loops) and extends mock record/replay so offline runs can replay multi-turn tool loops end-to-end. This fits AgentLoom’s goal of enabling deterministic “agentic” workflows (ReAct-style) while keeping provider adapters thin and unified.
Changes:
- Introduces unified tool-call models/surface area (
ToolDefinition,ToolCall,ProviderResponse.tool_calls, plusStepDefinition.tools/tool_choice/max_tool_iterations). - Implements an iterative tool-dispatch loop in
LLMCallStepthat re-prompts with tool results until the model stops requesting tools (or iteration cap is hit). - Updates providers (OpenAI/Anthropic/Google/Ollama) and MockProvider replay to translate tools, parse tool calls, and replay multi-turn recordings; adds an example + fixture + tests.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/providers/test_tool_calling.py | Adds unit/integration-ish tests for tool translation/parsing, dispatch, and an OpenAI loop scenario. |
| src/agentloom/steps/llm_call.py | Adds _run_tool_loop and wires llm_call execution through it when tools are declared. |
| src/agentloom/steps/_tools.py | New module: tool schema translation, tool-call parsing, parallel dispatch, and message synthesis helpers. |
| src/agentloom/providers/base.py | Adds ToolCall model and ProviderResponse.tool_calls. |
| src/agentloom/providers/openai.py | Translates tools/tool_choice into OpenAI payload and parses tool calls from responses. |
| src/agentloom/providers/anthropic.py | Translates tools/tool_choice, parses tool_use blocks, and allows dict blocks through message formatting. |
| src/agentloom/providers/google.py | Translates tools + tool_config and parses functionCall parts from responses. |
| src/agentloom/providers/ollama.py | Translates tools (OpenAI shape), logs unsupported tool_choice, and parses tool calls via OpenAI parser. |
| src/agentloom/providers/mock.py | Adds per-step turn replay and hydrates recorded tool_calls to drive tool-iteration loops. |
| src/agentloom/core/models.py | Adds ToolDefinition and new tool-calling fields on StepDefinition. |
| recordings/tool_calling.json | Adds a committed two-turn recording exercising tool calling via MockProvider. |
| examples/35_tool_calling.yaml | Adds an offline-replay example workflow demonstrating tool calling with the fixture. |
Comment on lines
+236
to
+240
| if provider == "google": | ||
| return { | ||
| "role": "model", | ||
| "parts": [{"functionCall": {"name": c.name, "args": c.arguments}} for c in calls], | ||
| } |
Comment on lines
+276
to
+290
| if provider == "google": | ||
| return [ | ||
| { | ||
| "role": "function", | ||
| "parts": [ | ||
| { | ||
| "functionResponse": { | ||
| "name": call.name, | ||
| "response": {"result": text} if success else {"error": text}, | ||
| } | ||
| } | ||
| for call, text, success in results | ||
| ], | ||
| } | ||
| ] |
Comment on lines
+241
to
+256
| # OpenAI / Ollama. ``content`` must be a string (the AgentLoom-internal | ||
| # message formatter doesn't handle None even though OpenAI's API does); | ||
| # an empty string serializes as `""` which OpenAI accepts as "no text" | ||
| # alongside tool_calls. | ||
| return { | ||
| "role": "assistant", | ||
| "content": content or "", | ||
| "tool_calls": [ | ||
| { | ||
| "id": c.id, | ||
| "type": "function", | ||
| "function": {"name": c.name, "arguments": json.dumps(c.arguments)}, | ||
| } | ||
| for c in calls | ||
| ], | ||
| } |
Comment on lines
+291
to
+292
| # OpenAI / Ollama: one tool message per call, keyed by tool_call_id. | ||
| return [{"role": "tool", "tool_call_id": call.id, "content": text} for call, text, _ in results] |
Comment on lines
+132
to
+140
| raw_args = fn.get("arguments") | ||
| if isinstance(raw_args, dict): | ||
| args: dict[str, Any] = raw_args | ||
| elif isinstance(raw_args, str): | ||
| try: | ||
| args = json.loads(raw_args or "{}") | ||
| except json.JSONDecodeError: | ||
| args = {} | ||
| else: |
Comment on lines
+187
to
+196
| if agentloom_tools: | ||
| from agentloom.steps._tools import translate_tools_for_google | ||
|
|
||
| extras["tools"] = translate_tools_for_google(agentloom_tools) | ||
| mode = { | ||
| "auto": "AUTO", | ||
| "required": "ANY", | ||
| "none": "NONE", | ||
| }.get(agentloom_tool_choice or "auto", "AUTO") | ||
| extras["tool_config"] = {"functionCallingConfig": {"mode": mode}} |
Comment on lines
+175
to
+182
| # Tool calling (#116) — LLM picks tools at runtime | ||
| tools: list[ToolDefinition] = Field(default_factory=list) | ||
| # ``tool_choice``: "auto" | "required" | "none" | {"name": "..."}. | ||
| # Auto lets the model decide; required forces a tool call; none | ||
| # disables tools for this turn (useful for tool-augmented chats that | ||
| # want a final summary without further calls). | ||
| tool_choice: Any = "auto" | ||
| max_tool_iterations: int = 5 |
Comment on lines
+59
to
+67
| @staticmethod | ||
| async def _run_tool_loop( | ||
| *, | ||
| context: StepContext, | ||
| step: StepDefinition, | ||
| messages: list[dict[str, Any]], | ||
| model: str, | ||
| provider_kwargs: dict[str, Any], | ||
| ) -> Any: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds native tool / function calling across all four LLM providers. The model decides at runtime which tool to invoke; the engine dispatches the call via the existing
ToolRegistry, feeds the result back as a follow-up message, and re-prompts until the model stops asking for tools (capped bymax_tool_iterations, default 5).StepDefinition.tools(list ofToolDefinition),tool_choice("auto"/"required"/"none"/{"name": "..."}),max_tool_iterations. NewProviderResponse.tool_callsfield, newToolCallmodel.LLMCallStep._run_tool_loopaccumulates tokens + cost across iterations and surfacesfinish_reason="max_tool_iterations"when the cap is hit so callers can detect runaway loops.toolslist to its native shape (OpenAI / Ollama:tools=[{type:function, function:...}]; Anthropic:tools=[{name, input_schema}]; Google:function_declarationsgrouped under onetoolsentry) and parses tool calls back from the response. Assistant + tool-result message synthesizers live inagentloom.steps._toolsso the conversation building is consistent across providers.tool_registry.get(name).execute(args), which already enforces the sandbox (harden tool sandbox against command, path, and URL-scheme bypasses #105). The iteration loop respects budget (fix DAG skip propagation, parallel cancellation, and budget overshoot #108) and the existing per-step retry policy (fix gateway resilience: CB/RL ordering, stream cancellation, retry jitter, rate-limiter edge cases #106).complete()call per loop iteration). Each turn'stool_callsblock is hydrated asToolCallobjects so offline replay drives the loop end-to-end. Newexamples/35_tool_calling.yamlexercises a ReAct-style flow (http_requestagainst httpbin.org) with the committed recording fixture.parse_tool_calls_from_openainow handles two wire variants: OpenAI canonical (type:"function"+argumentsas JSON string) and Ollama-compat (notypefield,argumentsas decoded dict). Without this, Ollama tool calling silently dropped every call. Regression test captured from a livellama3.1:8bresponse.Why
Closes the gap that the engine couldn't surface tool decisions to the LLM —
ToolStepwas a static DAG node, never a model-driven dispatch. Without this, ReAct loops, deep-research agents, function-calling assistants, and any benchmark that compares tool selection across models were inexpressible. Foundation for #119 (conversation history) and #120 (Agent primitive).Closes #116
Testing
uv run pytest— 1194 passeduv run ruff check src/ tests/cleanuv run mypy src/cleanllama3.1:8b): full roundtrip, tool dispatched with correct args (17+25=42), 296 tokens across 2 iterationsworkflow → step:solve → chat × 2 iterations, all sharingworkflow.run_id, canonical OTelgen_ai.*attrs per turnhost.docker.internal): tool dispatched (7+11=18), 346 tokensNotes
ToolStep(the static DAG node) keeps working unchanged for workflows that want explicit author-driven tool execution. The newtools=field onllm_callis the dynamic, model-driven path; nothing existing breaks.Anthropic rolls thinking tokens into
output_tokens(no separate field on the wire), so workflows combiningthinking+toolson Claude showreasoning_tokens=0even when the model used extended thinking — documented limitation, cost is still correct.