Summary
When a recipe step raises any exception during dispatch, _execute_recipe's outer except handler in modules/tool-recipes/amplifier_module_tool_recipes/__init__.py:464 catches it and converts it to:
return ToolResult(
success=False,
error={
"message": f"Recipe execution failed: {str(e)}",
"type": type(e).__name__,
},
)
The traceback is dropped. Only str(e) and the exception class name are preserved. Callers consuming ToolResult.error get a one-line message with no file/line/stack information. The CLI prints this once at the very end of stdout — but in headless subprocess invocations the message is buried in agent "Thinking..." progress output and the captured-buffer tail truncates before showing it.
Real-world impact
This blindness directly cost ~6 hours of debugging time during Amplifier-Resolve reality-check capability E2E. The recipe consistently failed at step 2 with 'str' object has no attribute 'get'. The actual call site (amplifier-app-cli/lib/merge_utils.py:62 — fixed in microsoft/amplifier-app-cli#169) was 7 stack frames downstream from _execute_recipe. We were unable to find it from ToolResult.error alone, despite five iterations of progressively-deeper instrumentation. The bug was eventually surfaced only by a manual interactive amplifier tool invoke recipes invocation with the outer except patched to call traceback.format_exc().
Proposed fix
Preserve the traceback in the error structure:
import traceback
except Exception as e:
return ToolResult(
success=False,
error={
"message": f"Recipe execution failed: {str(e)}",
"type": type(e).__name__,
"traceback": traceback.format_exc(), # NEW
},
)
The same pattern applies to the corresponding handler in _resume_recipe. Both should preserve the traceback so callers can surface it (in logs, in failure envelopes, in UI).
Risk assessment
- Compatibility: existing callers that read
ToolResult.error.message are unaffected. The new traceback field is additive.
- Size: tracebacks for typical recipe failures are 500-3000 chars. Negligible for
ToolResult payloads which already carry agent prompts and outputs in the kilobyte range.
- Privacy: tracebacks may include local file paths (no different from any Python exception). If callers need to redact, they can do so on the
traceback field specifically.
Suggested test
@pytest.mark.asyncio
async def test_recipe_execution_failure_preserves_traceback():
"""Regression: ToolResult.error must carry a 'traceback' field on failure
so callers can diagnose without rerunning under a debugger."""
# Arrange a recipe that will deterministically fail at step dispatch
# (e.g., reference an agent that doesn't exist)
...
result = await tool.execute(...)
assert result.success is False
assert "traceback" in result.error
assert "File" in result.error["traceback"]
assert len(result.error["traceback"]) > 50
Discovered while
Running the amplifier-bundle-reality-check recipe inside an Amplifier-Resolve reality-check runner sub-container. The downstream consumer (the runner) had to reverse-engineer what failed inside the recipe by reading events.jsonl post-mortem; preserving the traceback in ToolResult.error would have eliminated that entire debugging path.
🤖 Generated with Amplifier
Summary
When a recipe step raises any exception during dispatch,
_execute_recipe's outer except handler inmodules/tool-recipes/amplifier_module_tool_recipes/__init__.py:464catches it and converts it to:The traceback is dropped. Only
str(e)and the exception class name are preserved. Callers consumingToolResult.errorget a one-line message with no file/line/stack information. The CLI prints this once at the very end of stdout — but in headless subprocess invocations the message is buried in agent "Thinking..." progress output and the captured-buffer tail truncates before showing it.Real-world impact
This blindness directly cost ~6 hours of debugging time during Amplifier-Resolve reality-check capability E2E. The recipe consistently failed at step 2 with
'str' object has no attribute 'get'. The actual call site (amplifier-app-cli/lib/merge_utils.py:62— fixed in microsoft/amplifier-app-cli#169) was 7 stack frames downstream from_execute_recipe. We were unable to find it fromToolResult.erroralone, despite five iterations of progressively-deeper instrumentation. The bug was eventually surfaced only by a manual interactiveamplifier tool invoke recipesinvocation with the outer except patched to calltraceback.format_exc().Proposed fix
Preserve the traceback in the error structure:
The same pattern applies to the corresponding handler in
_resume_recipe. Both should preserve the traceback so callers can surface it (in logs, in failure envelopes, in UI).Risk assessment
ToolResult.error.messageare unaffected. The newtracebackfield is additive.ToolResultpayloads which already carry agent prompts and outputs in the kilobyte range.tracebackfield specifically.Suggested test
Discovered while
Running the
amplifier-bundle-reality-checkrecipe inside an Amplifier-Resolve reality-check runner sub-container. The downstream consumer (the runner) had to reverse-engineer what failed inside the recipe by reading events.jsonl post-mortem; preserving the traceback inToolResult.errorwould have eliminated that entire debugging path.🤖 Generated with Amplifier