Skip to content

Observability: _execute_recipe drops tracebacks; callers see only str(e) and exception class #274

@bkrabach

Description

@bkrabach

Summary

When a recipe step raises any exception during dispatch, _execute_recipe's outer except handler in modules/tool-recipes/amplifier_module_tool_recipes/__init__.py:464 catches it and converts it to:

return ToolResult(
    success=False,
    error={
        "message": f"Recipe execution failed: {str(e)}",
        "type": type(e).__name__,
    },
)

The traceback is dropped. Only str(e) and the exception class name are preserved. Callers consuming ToolResult.error get a one-line message with no file/line/stack information. The CLI prints this once at the very end of stdout — but in headless subprocess invocations the message is buried in agent "Thinking..." progress output and the captured-buffer tail truncates before showing it.

Real-world impact

This blindness directly cost ~6 hours of debugging time during Amplifier-Resolve reality-check capability E2E. The recipe consistently failed at step 2 with 'str' object has no attribute 'get'. The actual call site (amplifier-app-cli/lib/merge_utils.py:62 — fixed in microsoft/amplifier-app-cli#169) was 7 stack frames downstream from _execute_recipe. We were unable to find it from ToolResult.error alone, despite five iterations of progressively-deeper instrumentation. The bug was eventually surfaced only by a manual interactive amplifier tool invoke recipes invocation with the outer except patched to call traceback.format_exc().

Proposed fix

Preserve the traceback in the error structure:

import traceback

except Exception as e:
    return ToolResult(
        success=False,
        error={
            "message": f"Recipe execution failed: {str(e)}",
            "type": type(e).__name__,
            "traceback": traceback.format_exc(),  # NEW
        },
    )

The same pattern applies to the corresponding handler in _resume_recipe. Both should preserve the traceback so callers can surface it (in logs, in failure envelopes, in UI).

Risk assessment

  • Compatibility: existing callers that read ToolResult.error.message are unaffected. The new traceback field is additive.
  • Size: tracebacks for typical recipe failures are 500-3000 chars. Negligible for ToolResult payloads which already carry agent prompts and outputs in the kilobyte range.
  • Privacy: tracebacks may include local file paths (no different from any Python exception). If callers need to redact, they can do so on the traceback field specifically.

Suggested test

@pytest.mark.asyncio
async def test_recipe_execution_failure_preserves_traceback():
    """Regression: ToolResult.error must carry a 'traceback' field on failure
    so callers can diagnose without rerunning under a debugger."""
    # Arrange a recipe that will deterministically fail at step dispatch
    # (e.g., reference an agent that doesn't exist)
    ...
    result = await tool.execute(...)
    assert result.success is False
    assert "traceback" in result.error
    assert "File" in result.error["traceback"]
    assert len(result.error["traceback"]) > 50

Discovered while

Running the amplifier-bundle-reality-check recipe inside an Amplifier-Resolve reality-check runner sub-container. The downstream consumer (the runner) had to reverse-engineer what failed inside the recipe by reading events.jsonl post-mortem; preserving the traceback in ToolResult.error would have eliminated that entire debugging path.

🤖 Generated with Amplifier

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions