Skip to content

fix(voice-engine): resolve voice tool-call stalls and add per-tool result_mode control#56

Merged
jjleng merged 4 commits intomainfrom
fix/tool-result-mode-e2e
Apr 17, 2026
Merged

fix(voice-engine): resolve voice tool-call stalls and add per-tool result_mode control#56
jjleng merged 4 commits intomainfrom
fix/tool-result-mode-e2e

Conversation

@dngpng
Copy link
Copy Markdown
Contributor

@dngpng dngpng commented Apr 17, 2026

Summary

This PR fixes voice-mode tool-call stalling and adds explicit per-tool control of post-processing for tool results (result_mode), with end-to-end support in runtime, API, schema, and Studio UI.

Background

During voice tests, tool calls intermittently appeared to hang at ToolCallStarted in UI, while equivalent text tests completed normally.

Observed characteristics:

  • The tool itself completed and produced output.
  • In some sessions, the backend seemed delayed in consuming/completing the tool result event path.
  • Long tool outputs were often summarized, and in multi-tool chains this could remove critical details (e.g. URLs) needed by the next tool.

Investigation timeline and findings

  1. Initial suspicion: tool summarizer/filler path in voice loop
  • We inspected the voice runtime logs and identified cases where tool execution succeeded but the completion signal path could appear stalled from the UI perspective.
  • We focused on the difference between text and voice orchestration.
  1. Stall root cause area: event-loop scheduling + long micro-tasks
  • In the voice reactor loop, under continuous RTP/audio activity, select ordering could repeatedly favor audio paths.
  • That increased the chance that ready LLM/tool events were not drained promptly.
  • We also hardened micro-tasks (tool summarization/filler) with timeout guards so they fail-open instead of extending wait times.
  1. Consecutive tool-call quality issue discovered after stall improvement
  • When the first tool’s output was summarized, critical structured fields could be lost.
  • Second tool calls relying on those exact fields (like canonical URL from search result) could then fail (e.g. 404 on page-read tool).
  1. Design iteration: avoid tool-specific hacks
  • We explicitly avoided hardcoding per-tool logic in runtime.
  • Instead, we introduced declarative per-tool policy: result_mode.

Final solution

A) Voice stall mitigation

  • Prioritize draining LLM/tool events in reactor select loop before audio arm in biased select.
  • Add timeout fail-open behavior in micro-task helper paths (summarizer/filler), so they cannot block indefinitely.

B) Per-tool post-processing policy (result_mode)

  • New ToolResultMode enum in proto and ToolDef.result_mode.
  • Runtime tool_executor resolves mode:
    • Unspecified (auto) -> current global behavior (tool_summarizer on/off)
    • Summarize
    • Truncate
    • None (full)
  • This preserves backward compatibility for existing configs while allowing precise tool-level control for chained calls.

C) Config/API/UI end-to-end support

  • API patch endpoint supports tool_result_modes map (tool_id -> mode|null).
  • null means auto (unset field).
  • Validation supports allowed explicit modes: 1|2|3.
  • Added schema support in agent-config-v1.schema.json.
  • Studio config editor adds Result Mode control per tool with options:
    • Auto
    • Summary
    • Truncate
    • Full
  • UI updates include button placement/consistency improvements requested during iteration.

D) Persistence fix

  • Fixed config patch persistence by deep-copying config JSON before nested mutation in API (copy.deepcopy) to ensure SQLAlchemy JSON change detection works for nested tools.*.result_mode updates.

Files touched (high-level)

  • Runtime/proto:
    • proto/agent.proto
    • voice/engine/crates/agent-kit/src/tool_executor.rs
    • voice/engine/crates/agent-kit/src/swarm.rs
    • voice/engine/crates/agent-kit/src/quickjs_engine.rs
    • voice/engine/crates/agent-kit/src/agent_backends/default.rs
    • voice/engine/crates/agent-kit/src/micro_tasks.rs
    • voice/engine/src/reactor/mod.rs
  • API/backend:
    • studio/api/app/api/agents.py
    • studio/api/app/agent_builder/edit_ops.py
    • studio/api/app/agent_builder/service.py
    • studio/api/app/schemas/agent_pb2.py
    • studio/api/app/schemas/agent_pb2.pyi
  • Web/studio:
    • studio/web/public/schemas/agent-config-v1.schema.json
    • studio/web/src/components/agent/agent-config-editor.tsx
    • studio/web/src/lib/api/agent.ts
    • studio/web/src/lib/api/client.ts

Verification

  • Rust:
    • cd voice/engine/crates/agent-kit && cargo test --quiet (pass)
    • cd voice/engine && cargo test --quiet --no-run (pass; existing quickjs cfg warning unchanged)
  • Python:
    • ruff check on modified API files (pass)
  • Web:
    • pnpm -C studio/web run lint (pass)
  • Dev runtime:
    • Rebuilt voice-server via compose and validated logs for tool-start/tool-complete progression.

Backward compatibility

  • result_mode is optional.
  • Existing configs without result_mode continue to behave with system/global default (auto).

Follow-up

  • Consider exposing a system-level default policy in one place (if needed) while preserving per-tool overrides.

@jjleng jjleng force-pushed the fix/tool-result-mode-e2e branch from 4c473d4 to 9cca0e0 Compare April 17, 2026 17:32
@jjleng jjleng force-pushed the fix/tool-result-mode-e2e branch from 9cca0e0 to 98e3b6c Compare April 17, 2026 18:21
@jjleng
Copy link
Copy Markdown
Contributor

jjleng commented Apr 17, 2026

I do like the option of making global summarizer be a per-tool config. However I think the design too complex. Especially introducing extra burdens to the builder agent. Here's a better direction I think:

Default: truncate everything. Hard-cap all tool results at 8,000 chars with a [result truncated] notice appended. Zero latency, deterministic, covers the vast majority of cases. For context growth over a session, that's what the context summarizer is for — no need to burn another LLM call per tool result on top of that.

Per-tool summarizer: UI opt-in only. Users who have a specific tool that genuinely needs AI condensing can toggle it on per tool in the settings panel. Simple toggle, easy to ignore if you don't need it. This is strictly better than a global summarizer flag — finer-grained with no default overhead.

Builder: no awareness. Removing result_mode from the schema example and the Tool Result Mode section from the builder prompt entirely. Output handling is an operator concern, not something the builder should reason about per tool.

The 4-value enum and global summarizer flag will be dropped. The proto field becomes a simple bool summarize_result.

@jjleng jjleng force-pushed the fix/tool-result-mode-e2e branch from 487a02f to f276800 Compare April 17, 2026 21:56
@jjleng jjleng merged commit b31a6e3 into main Apr 17, 2026
1 check passed
@jjleng jjleng deleted the fix/tool-result-mode-e2e branch April 17, 2026 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants