Skip to content

add tool-calling example and replay support for tool-iteration loops (#116)#144

Closed
cchinchilla-dev wants to merge 3 commits intomainfrom
feat/tool-calling-116
Closed

add tool-calling example and replay support for tool-iteration loops (#116)#144
cchinchilla-dev wants to merge 3 commits intomainfrom
feat/tool-calling-116

Conversation

@cchinchilla-dev
Copy link
Copy Markdown
Owner

What

Adds native tool / function calling across all four LLM providers. The model decides at runtime which tool to invoke; the engine dispatches the call via the existing ToolRegistry, feeds the result back as a follow-up message, and re-prompts until the model stops asking for tools (capped by max_tool_iterations, default 5).

  • Unified surfaceStepDefinition.tools (list of ToolDefinition), tool_choice ("auto" / "required" / "none" / {"name": "..."}), max_tool_iterations. New ProviderResponse.tool_calls field, new ToolCall model. LLMCallStep._run_tool_loop accumulates tokens + cost across iterations and surfaces finish_reason="max_tool_iterations" when the cap is hit so callers can detect runaway loops.
  • Per-provider wire translation — each adapter maps the unified tools list to its native shape (OpenAI / Ollama: tools=[{type:function, function:...}]; Anthropic: tools=[{name, input_schema}]; Google: function_declarations grouped under one tools entry) and parses tool calls back from the response. Assistant + tool-result message synthesizers live in agentloom.steps._tools so the conversation building is consistent across providers.
  • Parallel dispatch — multiple tool calls in one response are executed concurrently via an anyio task group; results preserve order. Tool failures (unknown name, exception) are reported back to the model as text so it can recover on the next turn rather than aborting the whole loop.
  • Sandbox / budget / retry honored — dispatched calls go through tool_registry.get(name).execute(args), which already enforces the sandbox (harden tool sandbox against command, path, and URL-scheme bypasses #105). The iteration loop respects budget (fix DAG skip propagation, parallel cancellation, and budget overshoot #108) and the existing per-step retry policy (fix gateway resilience: CB/RL ordering, stream cancellation, retry jitter, rate-limiter edge cases #106).
  • MockProvider replay — recordings can now carry a list of turns per step (one complete() call per loop iteration). Each turn's tool_calls block is hydrated as ToolCall objects so offline replay drives the loop end-to-end. New examples/35_tool_calling.yaml exercises a ReAct-style flow (http_request against httpbin.org) with the committed recording fixture.
  • Bug fix found via real-Ollama smokeparse_tool_calls_from_openai now handles two wire variants: OpenAI canonical (type:"function" + arguments as JSON string) and Ollama-compat (no type field, arguments as decoded dict). Without this, Ollama tool calling silently dropped every call. Regression test captured from a live llama3.1:8b response.

Why

Closes the gap that the engine couldn't surface tool decisions to the LLM — ToolStep was a static DAG node, never a model-driven dispatch. Without this, ReAct loops, deep-research agents, function-calling assistants, and any benchmark that compares tool selection across models were inexpressible. Foundation for #119 (conversation history) and #120 (Agent primitive).

Closes #116

Testing

  • uv run pytest — 1194 passed
  • uv run ruff check src/ tests/ clean
  • uv run mypy src/ clean
  • CLI smoke (mock replay): tool loop drives 2 iterations against the committed recording, final answer reaches state
  • Real-provider smoke (Ollama llama3.1:8b): full roundtrip, tool dispatched with correct args (17+25=42), 296 tokens across 2 iterations
  • Docker → Jaeger smoke (Ollama, real OTel collector): 4 spans visible — workflow → step:solve → chat × 2 iterations, all sharing workflow.run_id, canonical OTel gen_ai.* attrs per turn
  • Kubernetes smoke (kind cluster, host Ollama via host.docker.internal): tool dispatched (7+11=18), 346 tokens

Notes

ToolStep (the static DAG node) keeps working unchanged for workflows that want explicit author-driven tool execution. The new tools= field on llm_call is the dynamic, model-driven path; nothing existing breaks.

Anthropic rolls thinking tokens into output_tokens (no separate field on the wire), so workflows combining thinking + tools on Claude show reasoning_tokens=0 even when the model used extended thinking — documented limitation, cost is still correct.

Copilot AI review requested due to automatic review settings May 9, 2026 11:12
@cchinchilla-dev cchinchilla-dev added enhancement New feature or request providers Provider gateway and adapters core Core engine, DAG, state labels May 9, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2026

Codecov Report

❌ Patch coverage is 85.44601% with 31 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/agentloom/steps/_tools.py 88.34% 12 Missing ⚠️
src/agentloom/providers/mock.py 47.05% 9 Missing ⚠️
src/agentloom/providers/google.py 66.66% 4 Missing ⚠️
src/agentloom/providers/ollama.py 55.55% 4 Missing ⚠️
src/agentloom/providers/anthropic.py 84.61% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class, provider-agnostic tool/function calling to llm_call steps (with bounded iteration loops) and extends mock record/replay so offline runs can replay multi-turn tool loops end-to-end. This fits AgentLoom’s goal of enabling deterministic “agentic” workflows (ReAct-style) while keeping provider adapters thin and unified.

Changes:

  • Introduces unified tool-call models/surface area (ToolDefinition, ToolCall, ProviderResponse.tool_calls, plus StepDefinition.tools/tool_choice/max_tool_iterations).
  • Implements an iterative tool-dispatch loop in LLMCallStep that re-prompts with tool results until the model stops requesting tools (or iteration cap is hit).
  • Updates providers (OpenAI/Anthropic/Google/Ollama) and MockProvider replay to translate tools, parse tool calls, and replay multi-turn recordings; adds an example + fixture + tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/providers/test_tool_calling.py Adds unit/integration-ish tests for tool translation/parsing, dispatch, and an OpenAI loop scenario.
src/agentloom/steps/llm_call.py Adds _run_tool_loop and wires llm_call execution through it when tools are declared.
src/agentloom/steps/_tools.py New module: tool schema translation, tool-call parsing, parallel dispatch, and message synthesis helpers.
src/agentloom/providers/base.py Adds ToolCall model and ProviderResponse.tool_calls.
src/agentloom/providers/openai.py Translates tools/tool_choice into OpenAI payload and parses tool calls from responses.
src/agentloom/providers/anthropic.py Translates tools/tool_choice, parses tool_use blocks, and allows dict blocks through message formatting.
src/agentloom/providers/google.py Translates tools + tool_config and parses functionCall parts from responses.
src/agentloom/providers/ollama.py Translates tools (OpenAI shape), logs unsupported tool_choice, and parses tool calls via OpenAI parser.
src/agentloom/providers/mock.py Adds per-step turn replay and hydrates recorded tool_calls to drive tool-iteration loops.
src/agentloom/core/models.py Adds ToolDefinition and new tool-calling fields on StepDefinition.
recordings/tool_calling.json Adds a committed two-turn recording exercising tool calling via MockProvider.
examples/35_tool_calling.yaml Adds an offline-replay example workflow demonstrating tool calling with the fixture.

Comment on lines +236 to +240
if provider == "google":
return {
"role": "model",
"parts": [{"functionCall": {"name": c.name, "args": c.arguments}} for c in calls],
}
Comment on lines +276 to +290
if provider == "google":
return [
{
"role": "function",
"parts": [
{
"functionResponse": {
"name": call.name,
"response": {"result": text} if success else {"error": text},
}
}
for call, text, success in results
],
}
]
Comment on lines +241 to +256
# OpenAI / Ollama. ``content`` must be a string (the AgentLoom-internal
# message formatter doesn't handle None even though OpenAI's API does);
# an empty string serializes as `""` which OpenAI accepts as "no text"
# alongside tool_calls.
return {
"role": "assistant",
"content": content or "",
"tool_calls": [
{
"id": c.id,
"type": "function",
"function": {"name": c.name, "arguments": json.dumps(c.arguments)},
}
for c in calls
],
}
Comment on lines +291 to +292
# OpenAI / Ollama: one tool message per call, keyed by tool_call_id.
return [{"role": "tool", "tool_call_id": call.id, "content": text} for call, text, _ in results]
Comment on lines +132 to +140
raw_args = fn.get("arguments")
if isinstance(raw_args, dict):
args: dict[str, Any] = raw_args
elif isinstance(raw_args, str):
try:
args = json.loads(raw_args or "{}")
except json.JSONDecodeError:
args = {}
else:
Comment on lines +187 to +196
if agentloom_tools:
from agentloom.steps._tools import translate_tools_for_google

extras["tools"] = translate_tools_for_google(agentloom_tools)
mode = {
"auto": "AUTO",
"required": "ANY",
"none": "NONE",
}.get(agentloom_tool_choice or "auto", "AUTO")
extras["tool_config"] = {"functionCallingConfig": {"mode": mode}}
Comment on lines +175 to +182
# Tool calling (#116) — LLM picks tools at runtime
tools: list[ToolDefinition] = Field(default_factory=list)
# ``tool_choice``: "auto" | "required" | "none" | {"name": "..."}.
# Auto lets the model decide; required forces a tool call; none
# disables tools for this turn (useful for tool-augmented chats that
# want a final summary without further calls).
tool_choice: Any = "auto"
max_tool_iterations: int = 5
Comment on lines +59 to +67
@staticmethod
async def _run_tool_loop(
*,
context: StepContext,
step: StepDefinition,
messages: list[dict[str, Any]],
model: str,
provider_kwargs: dict[str, Any],
) -> Any:
@cchinchilla-dev cchinchilla-dev deleted the feat/tool-calling-116 branch May 9, 2026 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core engine, DAG, state enhancement New feature or request providers Provider gateway and adapters

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add native tool/function calling with streaming and parallel-call support

2 participants