feat(providers): add native tool/function calling#145
Open
cchinchilla-dev wants to merge 4 commits intomainfrom
Open
feat(providers): add native tool/function calling#145cchinchilla-dev wants to merge 4 commits intomainfrom
cchinchilla-dev wants to merge 4 commits intomainfrom
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
Adds first-class, model-driven tool/function calling to AgentLoom’s llm_call step, with a unified tool schema and per-provider translations/parsing (OpenAI, Ollama, Anthropic, Google). This enables ReAct-style loops where the model selects tools at runtime, the engine dispatches via ToolRegistry, and the step re-prompts until completion or an iteration cap.
Changes:
- Introduces unified tool-calling models (
ToolDefinition,ToolCall) and plumbstools,tool_choice, andmax_tool_iterationsthroughLLMCallStepand provider adapters. - Adds tool-loop message synthesis + parallel tool dispatch helpers, plus typed streaming event primitives (
StreamEvent+ subclasses). - Extends observability with per-tool-call span/metrics support and adds a mock replay fixture + example workflow + docs/changelog updates.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/providers/test_tool_calling.py | End-to-end and per-provider regression tests for translation/parsing, tool-loop iteration, passthrough formatting, streaming events wrapper, and dispatch observability hook. |
| tests/observability/test_observer.py | Verifies WorkflowObserver.on_tool_call emits the expected span and records metrics. |
| tests/observability/test_noop.py | Ensures the noop observer implements the new on_tool_call hook. |
| tests/observability/test_metrics.py | Validates MetricsManager.record_tool_call counter/histogram labeling and recording. |
| src/agentloom/steps/llm_call.py | Implements the tool-call loop inside llm_call, accumulating tokens/cost across iterations and surfacing iteration-cap finish reason. |
| src/agentloom/steps/_tools.py | Adds tool translation/parsing helpers, parallel dispatch (anyio task group), and provider-specific message builders for tool loop turns. |
| src/agentloom/providers/openai.py | Adds tool definitions/tool_choice wiring, tool_call parsing, and passthrough formatting for tool-loop messages. |
| src/agentloom/providers/ollama.py | Adds tool definitions wiring, tool_call parsing via OpenAI-shaped parser, and passthrough formatting for tool-loop messages. |
| src/agentloom/providers/mock.py | Supports multi-turn step recordings (for tool loops) and hydrates recorded tool_calls for replay. |
| src/agentloom/providers/google.py | Adds Gemini tool declaration + tool_choice mapping, tool_call parsing, and passthrough formatting for parts-based tool-loop messages. |
| src/agentloom/providers/base.py | Introduces ToolCall, typed streaming event models, and exposes tool_calls on ProviderResponse / StreamResponse. |
| src/agentloom/providers/anthropic.py | Adds tool declaration + tool_choice mapping, tool_use parsing, and passthrough behavior for tool blocks. |
| src/agentloom/observability/schema.py | Adds span-attr keys for tool-call id/duration and metric names for tool-call counters/histograms. |
| src/agentloom/observability/observer.py | Implements on_tool_call to emit execute_tool {name} spans and record per-tool metrics. |
| src/agentloom/observability/noop.py | Adds noop on_tool_call hook. |
| src/agentloom/observability/metrics.py | Adds tool-call counter + histogram creation and record_tool_call() API. |
| src/agentloom/core/models.py | Adds ToolDefinition and new StepDefinition fields for tool calling (tools, tool_choice, max_tool_iterations). |
| recordings/tool_calling.json | Adds a multi-turn mock recording demonstrating a tool-call + follow-up completion. |
| examples/35_tool_calling.yaml | New example workflow showing model-driven tool calling (mock replay by default, optional live provider run). |
| docs/workflow-yaml.md | Documents the new llm_call tool-calling fields and clarifies static tool step vs model-driven tools. |
| docs/examples.md | Adds documentation entry for example 35 (native tool calling). |
| CHANGELOG.md | Changelog entry describing the new tool calling capability and related behavior. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds native tool / function calling across all four LLM providers. The model decides at runtime which tool to invoke; the engine dispatches via the existing
ToolRegistry, feeds results back, and re-prompts until the model stops asking for tools (capped bymax_tool_iterations, default 5).StepDefinition.tools,tool_choice,max_tool_iterationsonllm_call. NewToolDefinitionmodel; newProviderResponse.tool_calls: list[ToolCall].LLMCallStep._run_tool_loopaccumulates tokens + cost and surfacesfinish_reason="max_tool_iterations"when the cap fires.tools=[{type:"function", function:...}], Anthropictools=[{name, input_schema}], Google groups underfunction_declarations. Tool-loop messages ({"role":"assistant","tool_calls":[...]},{"role":"tool","tool_call_id":...}, Geminipartsshapes) round-trip through_format_messagesverbatim — without this passthrough, three of four providers silently dropped iteration 2+ messages.StreamEventhierarchy (TextDelta,ToolCallDelta,ToolCallComplete,StreamDone) plussr.events()onStreamResponse. Backwards-compat:async for chunk in srkeeps yielding plain text strings.execute_tool {name}child span carryingtool.{call_id,name,args_hash,result_hash,duration_ms,success}andworkflow.run_id; newagentloom_tool_calls_total{tool_name,status}counter +agentloom_tool_call_duration_seconds{tool_name}histogram. Args / result are SHA-256 hashed (truncated to 16 hex) so PII never lands on the trace.tool_registry.get(name).execute(args), honoring#105sandbox enforcement; loop respects budget (#108) and per-step retry policy (#106).tool_callsblock hydrated asToolCallobjects so offline replay drives the loop end-to-end.parse_tool_calls_from_openainow handles two wire variants: OpenAI canonical (type:"function"+argumentsas JSON string) and Ollama-compat (notypefield,argumentsas decoded dict). Without this, Ollama tool calling silently dropped every call.tool_choice={"name": "..."}— translates toANYmode +allowedFunctionNamesrather than silently falling through to AUTO.Why
Closes the gap that the engine couldn't surface tool decisions to the LLM —
ToolStepwas a static DAG node, never a model-driven dispatch. Without this, ReAct loops, deep-research agents, function-calling assistants, and any benchmark that compares tool selection across models were inexpressible. Foundation for #119 (conversation history) and #120 (Agent primitive).Closes #116
Testing
uv run pytest— 1205 passeduv run ruff check src/ tests/cleanuv run mypy src/cleanllama3.1:8b): full roundtrip, tool dispatched with correct args (17+25=42), 296 tokens across 2 iterationsworkflow → step:solve → chat × 2 iterations, all sharingworkflow.run_id, canonical OTelgen_ai.*attrs per turnhost.docker.internal): tool dispatched (7+11=18), 346 tokenstool_choicedict → addressedobservability/{observer,noop,schema}.py100%,metrics.py99%, providers 90-95%Notes
ToolStep(the static DAG node) keeps working unchanged for workflows that want explicit author-driven tool execution. The newtools=field onllm_callis the dynamic, model-driven path; nothing existing breaks.Streaming tool-call events: the
StreamEventAPI surface is implemented with a defaultevents()wrapper that emitsTextDeltaper chunk +StreamDoneat the end. Per-provider native streaming ofToolCallDelta/ToolCallCompletedeltas is a follow-up — adapters can register a typed event iterator via_set_event_iteratoronce SSE parsers are extended.Anthropic rolls thinking tokens into
output_tokens(no separate field on the wire), so workflows combiningthinking+toolson Claude showreasoning_tokens=0even when the model used extended thinking — documented limitation, cost is still correct.