fix(voice-engine): resolve voice tool-call stalls and add per-tool result_mode control by dngpng · Pull Request #56 · ferosai/feros

dngpng · 2026-04-17T16:12:02Z

Summary

This PR fixes voice-mode tool-call stalling and adds explicit per-tool control of post-processing for tool results (result_mode), with end-to-end support in runtime, API, schema, and Studio UI.

Background

During voice tests, tool calls intermittently appeared to hang at ToolCallStarted in UI, while equivalent text tests completed normally.

Observed characteristics:

The tool itself completed and produced output.
In some sessions, the backend seemed delayed in consuming/completing the tool result event path.
Long tool outputs were often summarized, and in multi-tool chains this could remove critical details (e.g. URLs) needed by the next tool.

Investigation timeline and findings

Initial suspicion: tool summarizer/filler path in voice loop

We inspected the voice runtime logs and identified cases where tool execution succeeded but the completion signal path could appear stalled from the UI perspective.
We focused on the difference between text and voice orchestration.

Stall root cause area: event-loop scheduling + long micro-tasks

In the voice reactor loop, under continuous RTP/audio activity, select ordering could repeatedly favor audio paths.
That increased the chance that ready LLM/tool events were not drained promptly.
We also hardened micro-tasks (tool summarization/filler) with timeout guards so they fail-open instead of extending wait times.

Consecutive tool-call quality issue discovered after stall improvement

When the first tool’s output was summarized, critical structured fields could be lost.
Second tool calls relying on those exact fields (like canonical URL from search result) could then fail (e.g. 404 on page-read tool).

Design iteration: avoid tool-specific hacks

We explicitly avoided hardcoding per-tool logic in runtime.
Instead, we introduced declarative per-tool policy: result_mode.

Final solution

A) Voice stall mitigation

Prioritize draining LLM/tool events in reactor select loop before audio arm in biased select.
Add timeout fail-open behavior in micro-task helper paths (summarizer/filler), so they cannot block indefinitely.

B) Per-tool post-processing policy (`result_mode`)

New ToolResultMode enum in proto and ToolDef.result_mode.
Runtime tool_executor resolves mode:
- Unspecified (auto) -> current global behavior (tool_summarizer on/off)
- Summarize
- Truncate
- None (full)
This preserves backward compatibility for existing configs while allowing precise tool-level control for chained calls.

C) Config/API/UI end-to-end support

API patch endpoint supports tool_result_modes map (tool_id -> mode|null).
null means auto (unset field).
Validation supports allowed explicit modes: 1|2|3.
Added schema support in agent-config-v1.schema.json.
Studio config editor adds Result Mode control per tool with options:
- Auto
- Summary
- Truncate
- Full
UI updates include button placement/consistency improvements requested during iteration.

D) Persistence fix

Fixed config patch persistence by deep-copying config JSON before nested mutation in API (copy.deepcopy) to ensure SQLAlchemy JSON change detection works for nested tools.*.result_mode updates.

Files touched (high-level)

Runtime/proto:
- proto/agent.proto
- voice/engine/crates/agent-kit/src/tool_executor.rs
- voice/engine/crates/agent-kit/src/swarm.rs
- voice/engine/crates/agent-kit/src/quickjs_engine.rs
- voice/engine/crates/agent-kit/src/agent_backends/default.rs
- voice/engine/crates/agent-kit/src/micro_tasks.rs
- voice/engine/src/reactor/mod.rs
API/backend:
- studio/api/app/api/agents.py
- studio/api/app/agent_builder/edit_ops.py
- studio/api/app/agent_builder/service.py
- studio/api/app/schemas/agent_pb2.py
- studio/api/app/schemas/agent_pb2.pyi
Web/studio:
- studio/web/public/schemas/agent-config-v1.schema.json
- studio/web/src/components/agent/agent-config-editor.tsx
- studio/web/src/lib/api/agent.ts
- studio/web/src/lib/api/client.ts

Verification

Rust:
- cd voice/engine/crates/agent-kit && cargo test --quiet (pass)
- cd voice/engine && cargo test --quiet --no-run (pass; existing quickjs cfg warning unchanged)
Python:
- ruff check on modified API files (pass)
Web:
- pnpm -C studio/web run lint (pass)
Dev runtime:
- Rebuilt voice-server via compose and validated logs for tool-start/tool-complete progression.

Backward compatibility

result_mode is optional.
Existing configs without result_mode continue to behave with system/global default (auto).

Follow-up

Consider exposing a system-level default policy in one place (if needed) while preserving per-tool overrides.

…o tasks

jjleng · 2026-04-17T19:19:37Z

I do like the option of making global summarizer be a per-tool config. However I think the design too complex. Especially introducing extra burdens to the builder agent. Here's a better direction I think:

Default: truncate everything. Hard-cap all tool results at 8,000 chars with a [result truncated] notice appended. Zero latency, deterministic, covers the vast majority of cases. For context growth over a session, that's what the context summarizer is for — no need to burn another LLM call per tool result on top of that.

Per-tool summarizer: UI opt-in only. Users who have a specific tool that genuinely needs AI condensing can toggle it on per tool in the settings panel. Simple toggle, easy to ignore if you don't need it. This is strictly better than a global summarizer flag — finer-grained with no default overhead.

Builder: no awareness. Removing result_mode from the schema example and the Tool Result Mode section from the builder prompt entirely. Output handling is an operator concern, not something the builder should reason about per tool.

The 4-value enum and global summarizer flag will be dropped. The proto field becomes a simple bool summarize_result.

fix(agent-config): add end-to-end per-tool result_mode controls

88afa34

jjleng force-pushed the fix/tool-result-mode-e2e branch from 4c473d4 to 9cca0e0 Compare April 17, 2026 17:32

fix(voice-engine): prevent voice tool-call stalls in reactor and micr…

98e3b6c

…o tasks

jjleng force-pushed the fix/tool-result-mode-e2e branch from 9cca0e0 to 98e3b6c Compare April 17, 2026 18:21

chore: pin a protoc version

bc2409f

refactor(agent): simplify tool result summarization configuring

f276800

jjleng force-pushed the fix/tool-result-mode-e2e branch from 487a02f to f276800 Compare April 17, 2026 21:56

jjleng merged commit b31a6e3 into main Apr 17, 2026
1 check passed

jjleng deleted the fix/tool-result-mode-e2e branch April 17, 2026 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(voice-engine): resolve voice tool-call stalls and add per-tool result_mode control#56

fix(voice-engine): resolve voice tool-call stalls and add per-tool result_mode control#56
jjleng merged 4 commits intomainfrom
fix/tool-result-mode-e2e

dngpng commented Apr 17, 2026

Uh oh!

jjleng commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dngpng commented Apr 17, 2026

Summary

Background

Investigation timeline and findings

Final solution

A) Voice stall mitigation

B) Per-tool post-processing policy (result_mode)

C) Config/API/UI end-to-end support

D) Persistence fix

Files touched (high-level)

Verification

Backward compatibility

Follow-up

Uh oh!

jjleng commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

B) Per-tool post-processing policy (`result_mode`)