add tool-calling example and replay support for tool-iteration loops (#116) by cchinchilla-dev · Pull Request #144 · cchinchilla-dev/agentloom

cchinchilla-dev · 2026-05-09T11:12:58Z

What

Adds native tool / function calling across all four LLM providers. The model decides at runtime which tool to invoke; the engine dispatches the call via the existing ToolRegistry, feeds the result back as a follow-up message, and re-prompts until the model stops asking for tools (capped by max_tool_iterations, default 5).

Unified surface — StepDefinition.tools (list of ToolDefinition), tool_choice ("auto" / "required" / "none" / {"name": "..."}), max_tool_iterations. New ProviderResponse.tool_calls field, new ToolCall model. LLMCallStep._run_tool_loop accumulates tokens + cost across iterations and surfaces finish_reason="max_tool_iterations" when the cap is hit so callers can detect runaway loops.
Per-provider wire translation — each adapter maps the unified tools list to its native shape (OpenAI / Ollama: tools=[{type:function, function:...}]; Anthropic: tools=[{name, input_schema}]; Google: function_declarations grouped under one tools entry) and parses tool calls back from the response. Assistant + tool-result message synthesizers live in agentloom.steps._tools so the conversation building is consistent across providers.
Parallel dispatch — multiple tool calls in one response are executed concurrently via an anyio task group; results preserve order. Tool failures (unknown name, exception) are reported back to the model as text so it can recover on the next turn rather than aborting the whole loop.
Sandbox / budget / retry honored — dispatched calls go through tool_registry.get(name).execute(args), which already enforces the sandbox (harden tool sandbox against command, path, and URL-scheme bypasses #105). The iteration loop respects budget (fix DAG skip propagation, parallel cancellation, and budget overshoot #108) and the existing per-step retry policy (fix gateway resilience: CB/RL ordering, stream cancellation, retry jitter, rate-limiter edge cases #106).
MockProvider replay — recordings can now carry a list of turns per step (one complete() call per loop iteration). Each turn's tool_calls block is hydrated as ToolCall objects so offline replay drives the loop end-to-end. New examples/35_tool_calling.yaml exercises a ReAct-style flow (http_request against httpbin.org) with the committed recording fixture.
Bug fix found via real-Ollama smoke — parse_tool_calls_from_openai now handles two wire variants: OpenAI canonical (type:"function" + arguments as JSON string) and Ollama-compat (no type field, arguments as decoded dict). Without this, Ollama tool calling silently dropped every call. Regression test captured from a live llama3.1:8b response.

Why

Closes the gap that the engine couldn't surface tool decisions to the LLM — ToolStep was a static DAG node, never a model-driven dispatch. Without this, ReAct loops, deep-research agents, function-calling assistants, and any benchmark that compares tool selection across models were inexpressible. Foundation for #119 (conversation history) and #120 (Agent primitive).

Closes #116

Testing

uv run pytest — 1194 passed
uv run ruff check src/ tests/ clean
uv run mypy src/ clean
CLI smoke (mock replay): tool loop drives 2 iterations against the committed recording, final answer reaches state
Real-provider smoke (Ollama llama3.1:8b): full roundtrip, tool dispatched with correct args (17+25=42), 296 tokens across 2 iterations
Docker → Jaeger smoke (Ollama, real OTel collector): 4 spans visible — workflow → step:solve → chat × 2 iterations, all sharing workflow.run_id, canonical OTel gen_ai.* attrs per turn
Kubernetes smoke (kind cluster, host Ollama via host.docker.internal): tool dispatched (7+11=18), 346 tokens

Notes

ToolStep (the static DAG node) keeps working unchanged for workflows that want explicit author-driven tool execution. The new tools= field on llm_call is the dynamic, model-driven path; nothing existing breaks.

Anthropic rolls thinking tokens into output_tokens (no separate field on the wire), so workflows combining thinking + tools on Claude show reasoning_tokens=0 even when the model used extended thinking — documented limitation, cost is still correct.

codecov · 2026-05-09T11:15:18Z

Codecov Report

❌ Patch coverage is 85.44601% with 31 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/agentloom/steps/_tools.py	88.34%	12 Missing ⚠️
src/agentloom/providers/mock.py	47.05%	9 Missing ⚠️
src/agentloom/providers/google.py	66.66%	4 Missing ⚠️
src/agentloom/providers/ollama.py	55.55%	4 Missing ⚠️
src/agentloom/providers/anthropic.py	84.61%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

Adds first-class, provider-agnostic tool/function calling to llm_call steps (with bounded iteration loops) and extends mock record/replay so offline runs can replay multi-turn tool loops end-to-end. This fits AgentLoom’s goal of enabling deterministic “agentic” workflows (ReAct-style) while keeping provider adapters thin and unified.

Changes:

Introduces unified tool-call models/surface area (ToolDefinition, ToolCall, ProviderResponse.tool_calls, plus StepDefinition.tools/tool_choice/max_tool_iterations).
Implements an iterative tool-dispatch loop in LLMCallStep that re-prompts with tool results until the model stops requesting tools (or iteration cap is hit).
Updates providers (OpenAI/Anthropic/Google/Ollama) and MockProvider replay to translate tools, parse tool calls, and replay multi-turn recordings; adds an example + fixture + tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
tests/providers/test_tool_calling.py	Adds unit/integration-ish tests for tool translation/parsing, dispatch, and an OpenAI loop scenario.
src/agentloom/steps/llm_call.py	Adds `_run_tool_loop` and wires `llm_call` execution through it when tools are declared.
src/agentloom/steps/_tools.py	New module: tool schema translation, tool-call parsing, parallel dispatch, and message synthesis helpers.
src/agentloom/providers/base.py	Adds `ToolCall` model and `ProviderResponse.tool_calls`.
src/agentloom/providers/openai.py	Translates tools/tool_choice into OpenAI payload and parses tool calls from responses.
src/agentloom/providers/anthropic.py	Translates tools/tool_choice, parses `tool_use` blocks, and allows dict blocks through message formatting.
src/agentloom/providers/google.py	Translates tools + tool_config and parses functionCall parts from responses.
src/agentloom/providers/ollama.py	Translates tools (OpenAI shape), logs unsupported tool_choice, and parses tool calls via OpenAI parser.
src/agentloom/providers/mock.py	Adds per-step turn replay and hydrates recorded tool_calls to drive tool-iteration loops.
src/agentloom/core/models.py	Adds `ToolDefinition` and new tool-calling fields on `StepDefinition`.
recordings/tool_calling.json	Adds a committed two-turn recording exercising tool calling via MockProvider.
examples/35_tool_calling.yaml	Adds an offline-replay example workflow demonstrating tool calling with the fixture.

+    if provider == "google":
+        return {
+            "role": "model",
+            "parts": [{"functionCall": {"name": c.name, "args": c.arguments}} for c in calls],
+        }


+    if provider == "google":
+        return [
+            {
+                "role": "function",
+                "parts": [
+                    {
+                        "functionResponse": {
+                            "name": call.name,
+                            "response": {"result": text} if success else {"error": text},
+                        }
+                    }
+                    for call, text, success in results
+                ],
+            }
+        ]


+    # OpenAI / Ollama. ``content`` must be a string (the AgentLoom-internal
+    # message formatter doesn't handle None even though OpenAI's API does);
+    # an empty string serializes as `""` which OpenAI accepts as "no text"
+    # alongside tool_calls.
+    return {
+        "role": "assistant",
+        "content": content or "",
+        "tool_calls": [
+            {
+                "id": c.id,
+                "type": "function",
+                "function": {"name": c.name, "arguments": json.dumps(c.arguments)},
+            }
+            for c in calls
+        ],
+    }


+    # OpenAI / Ollama: one tool message per call, keyed by tool_call_id.
+    return [{"role": "tool", "tool_call_id": call.id, "content": text} for call, text, _ in results]


+        raw_args = fn.get("arguments")
+        if isinstance(raw_args, dict):
+            args: dict[str, Any] = raw_args
+        elif isinstance(raw_args, str):
+            try:
+                args = json.loads(raw_args or "{}")
+            except json.JSONDecodeError:
+                args = {}
+        else:


+        if agentloom_tools:
+            from agentloom.steps._tools import translate_tools_for_google
+
+            extras["tools"] = translate_tools_for_google(agentloom_tools)
+            mode = {
+                "auto": "AUTO",
+                "required": "ANY",
+                "none": "NONE",
+            }.get(agentloom_tool_choice or "auto", "AUTO")
+            extras["tool_config"] = {"functionCallingConfig": {"mode": mode}}


+    # Tool calling (#116) — LLM picks tools at runtime
+    tools: list[ToolDefinition] = Field(default_factory=list)
+    # ``tool_choice``: "auto" | "required" | "none" | {"name": "..."}.
+    # Auto lets the model decide; required forces a tool call; none
+    # disables tools for this turn (useful for tool-augmented chats that
+    # want a final summary without further calls).
+    tool_choice: Any = "auto"
+    max_tool_iterations: int = 5


+    @staticmethod
+    async def _run_tool_loop(
+        *,
+        context: StepContext,
+        step: StepDefinition,
+        messages: list[dict[str, Any]],
+        model: str,
+        provider_kwargs: dict[str, Any],
+    ) -> Any:


cchinchilla-dev added 3 commits May 8, 2026 15:32

add native tool/function calling across providers

9cc00ca

add tool-calling example and replay support for tool-iteration loops

c61ab9c

fix(providers): parse ollama-shaped tool_calls (no type, dict args)

29a347c

Copilot AI review requested due to automatic review settings May 9, 2026 11:12

cchinchilla-dev added enhancement New feature or request providers Provider gateway and adapters core Core engine, DAG, state labels May 9, 2026

Copilot started reviewing on behalf of cchinchilla-dev May 9, 2026 11:13 View session

Copilot AI reviewed May 9, 2026

View reviewed changes

cchinchilla-dev closed this May 9, 2026

cchinchilla-dev deleted the feat/tool-calling-116 branch May 9, 2026 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add tool-calling example and replay support for tool-iteration loops (#116)#144

add tool-calling example and replay support for tool-iteration loops (#116)#144
cchinchilla-dev wants to merge 3 commits intomainfrom
feat/tool-calling-116

cchinchilla-dev commented May 9, 2026

Uh oh!

codecov Bot commented May 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# OpenAI / Ollama: one tool message per call, keyed by tool_call_id.
		return [{"role": "tool", "tool_call_id": call.id, "content": text} for call, text, _ in results]

Conversation

cchinchilla-dev commented May 9, 2026

What

Why

Testing

Notes

Uh oh!

codecov Bot commented May 9, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants