feat(providers): add native tool/function calling by cchinchilla-dev · Pull Request #145 · cchinchilla-dev/agentloom

cchinchilla-dev · 2026-05-10T12:30:36Z

What

Adds native tool / function calling across all four LLM providers. The model decides at runtime which tool to invoke; the engine dispatches via the existing ToolRegistry, feeds results back, and re-prompts until the model stops asking for tools (capped by max_tool_iterations, default 5).

Unified surface — StepDefinition.tools, tool_choice, max_tool_iterations on llm_call. New ToolDefinition model; new ProviderResponse.tool_calls: list[ToolCall]. LLMCallStep._run_tool_loop accumulates tokens + cost and surfaces finish_reason="max_tool_iterations" when the cap fires.
Per-provider wire translation — each adapter maps the unified spec to its native shape: OpenAI / Ollama use tools=[{type:"function", function:...}], Anthropic tools=[{name, input_schema}], Google groups under function_declarations. Tool-loop messages ({"role":"assistant","tool_calls":[...]}, {"role":"tool","tool_call_id":...}, Gemini parts shapes) round-trip through _format_messages verbatim — without this passthrough, three of four providers silently dropped iteration 2+ messages.
Parallel dispatch — multiple tool calls in one response execute concurrently via an anyio task group; failures are reported back as text so the model can recover on the next turn.
Streaming events — typed StreamEvent hierarchy (TextDelta, ToolCallDelta, ToolCallComplete, StreamDone) plus sr.events() on StreamResponse. Backwards-compat: async for chunk in sr keeps yielding plain text strings.
Observability — per-call execute_tool {name} child span carrying tool.{call_id,name,args_hash,result_hash,duration_ms,success} and workflow.run_id; new agentloom_tool_calls_total{tool_name,status} counter + agentloom_tool_call_duration_seconds{tool_name} histogram. Args / result are SHA-256 hashed (truncated to 16 hex) so PII never lands on the trace.
Sandbox / budget / retry — dispatched calls go through tool_registry.get(name).execute(args), honoring #105 sandbox enforcement; loop respects budget (#108) and per-step retry policy (#106).
MockProvider replay — recordings carry a list of turns per step, each with its own tool_calls block hydrated as ToolCall objects so offline replay drives the loop end-to-end.
Bug fix found via real-Ollama smoke — parse_tool_calls_from_openai now handles two wire variants: OpenAI canonical (type:"function" + arguments as JSON string) and Ollama-compat (no type field, arguments as decoded dict). Without this, Ollama tool calling silently dropped every call.
Google tool_choice={"name": "..."} — translates to ANY mode + allowedFunctionNames rather than silently falling through to AUTO.

Why

Closes the gap that the engine couldn't surface tool decisions to the LLM — ToolStep was a static DAG node, never a model-driven dispatch. Without this, ReAct loops, deep-research agents, function-calling assistants, and any benchmark that compares tool selection across models were inexpressible. Foundation for #119 (conversation history) and #120 (Agent primitive).

Closes #116

Testing

Notes

ToolStep (the static DAG node) keeps working unchanged for workflows that want explicit author-driven tool execution. The new tools= field on llm_call is the dynamic, model-driven path; nothing existing breaks.

Streaming tool-call events: the StreamEvent API surface is implemented with a default events() wrapper that emits TextDelta per chunk + StreamDone at the end. Per-provider native streaming of ToolCallDelta / ToolCallComplete deltas is a follow-up — adapters can register a typed event iterator via _set_event_iterator once SSE parsers are extended.

Anthropic rolls thinking tokens into output_tokens (no separate field on the wire), so workflows combining thinking + tools on Claude show reasoning_tokens=0 even when the model used extended thinking — documented limitation, cost is still correct.

codecov · 2026-05-10T12:33:23Z

Codecov Report

❌ Patch coverage is 89.23611% with 31 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/agentloom/steps/_tools.py	89.56%	12 Missing ⚠️
src/agentloom/providers/mock.py	47.05%	9 Missing ⚠️
src/agentloom/providers/ollama.py	66.66%	4 Missing ⚠️
src/agentloom/providers/anthropic.py	84.61%	2 Missing ⚠️
src/agentloom/providers/google.py	89.47%	2 Missing ⚠️
src/agentloom/observability/metrics.py	90.90%	1 Missing ⚠️
src/agentloom/steps/llm_call.py	97.29%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

Adds first-class, model-driven tool/function calling to AgentLoom’s llm_call step, with a unified tool schema and per-provider translations/parsing (OpenAI, Ollama, Anthropic, Google). This enables ReAct-style loops where the model selects tools at runtime, the engine dispatches via ToolRegistry, and the step re-prompts until completion or an iteration cap.

Changes:

Introduces unified tool-calling models (ToolDefinition, ToolCall) and plumbs tools, tool_choice, and max_tool_iterations through LLMCallStep and provider adapters.
Adds tool-loop message synthesis + parallel tool dispatch helpers, plus typed streaming event primitives (StreamEvent + subclasses).
Extends observability with per-tool-call span/metrics support and adds a mock replay fixture + example workflow + docs/changelog updates.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/providers/test_tool_calling.py	End-to-end and per-provider regression tests for translation/parsing, tool-loop iteration, passthrough formatting, streaming events wrapper, and dispatch observability hook.
tests/observability/test_observer.py	Verifies `WorkflowObserver.on_tool_call` emits the expected span and records metrics.
tests/observability/test_noop.py	Ensures the noop observer implements the new `on_tool_call` hook.
tests/observability/test_metrics.py	Validates `MetricsManager.record_tool_call` counter/histogram labeling and recording.
src/agentloom/steps/llm_call.py	Implements the tool-call loop inside `llm_call`, accumulating tokens/cost across iterations and surfacing iteration-cap finish reason.
src/agentloom/steps/_tools.py	Adds tool translation/parsing helpers, parallel dispatch (anyio task group), and provider-specific message builders for tool loop turns.
src/agentloom/providers/openai.py	Adds tool definitions/tool_choice wiring, tool_call parsing, and passthrough formatting for tool-loop messages.
src/agentloom/providers/ollama.py	Adds tool definitions wiring, tool_call parsing via OpenAI-shaped parser, and passthrough formatting for tool-loop messages.
src/agentloom/providers/mock.py	Supports multi-turn step recordings (for tool loops) and hydrates recorded `tool_calls` for replay.
src/agentloom/providers/google.py	Adds Gemini tool declaration + tool_choice mapping, tool_call parsing, and passthrough formatting for parts-based tool-loop messages.
src/agentloom/providers/base.py	Introduces `ToolCall`, typed streaming event models, and exposes `tool_calls` on `ProviderResponse` / `StreamResponse`.
src/agentloom/providers/anthropic.py	Adds tool declaration + tool_choice mapping, tool_use parsing, and passthrough behavior for tool blocks.
src/agentloom/observability/schema.py	Adds span-attr keys for tool-call id/duration and metric names for tool-call counters/histograms.
src/agentloom/observability/observer.py	Implements `on_tool_call` to emit `execute_tool {name}` spans and record per-tool metrics.
src/agentloom/observability/noop.py	Adds noop `on_tool_call` hook.
src/agentloom/observability/metrics.py	Adds tool-call counter + histogram creation and `record_tool_call()` API.
src/agentloom/core/models.py	Adds `ToolDefinition` and new `StepDefinition` fields for tool calling (`tools`, `tool_choice`, `max_tool_iterations`).
recordings/tool_calling.json	Adds a multi-turn mock recording demonstrating a tool-call + follow-up completion.
examples/35_tool_calling.yaml	New example workflow showing model-driven tool calling (mock replay by default, optional live provider run).
docs/workflow-yaml.md	Documents the new `llm_call` tool-calling fields and clarifies static `tool` step vs model-driven tools.
docs/examples.md	Adds documentation entry for example 35 (native tool calling).
CHANGELOG.md	Changelog entry describing the new tool calling capability and related behavior.

cchinchilla-dev added 4 commits May 10, 2026 14:24

add native tool/function calling across providers

9a71830

add tool-calling example and replay support for tool-iteration loops

f9f820c

fix(providers): parse ollama-shaped tool_calls (no type, dict args)

a58a156

docs: surface tool calling in changelog, workflow-yaml, and examples

c2966e8

cchinchilla-dev added enhancement New feature or request providers Provider gateway and adapters core Core engine, DAG, state labels May 10, 2026

Copilot AI review requested due to automatic review settings May 10, 2026 12:30

cchinchilla-dev added enhancement New feature or request providers Provider gateway and adapters core Core engine, DAG, state labels May 10, 2026

github-actions Bot added documentation Documentation improvements observability Tracing, metrics, logging labels May 10, 2026

Copilot started reviewing on behalf of cchinchilla-dev May 10, 2026 12:31 View session

Copilot AI reviewed May 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(providers): add native tool/function calling#145

feat(providers): add native tool/function calling#145
cchinchilla-dev wants to merge 4 commits intomainfrom
feat/tool-calling-116

cchinchilla-dev commented May 10, 2026

Uh oh!

codecov Bot commented May 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cchinchilla-dev commented May 10, 2026

What

Why

Testing

Notes

Uh oh!

codecov Bot commented May 10, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants