Observability: _execute_recipe drops tracebacks; callers see only str(e) and exception class

## Summary

When a recipe step raises any exception during dispatch, `_execute_recipe`'s outer except handler in `modules/tool-recipes/amplifier_module_tool_recipes/__init__.py:464` catches it and converts it to:

```python
return ToolResult(
    success=False,
    error={
        "message": f"Recipe execution failed: {str(e)}",
        "type": type(e).__name__,
    },
)
```

**The traceback is dropped.** Only `str(e)` and the exception class name are preserved. Callers consuming `ToolResult.error` get a one-line message with no file/line/stack information. The CLI prints this once at the very end of stdout — but in headless subprocess invocations the message is buried in agent "Thinking..." progress output and the captured-buffer tail truncates before showing it.

## Real-world impact

This blindness directly cost ~6 hours of debugging time during Amplifier-Resolve reality-check capability E2E. The recipe consistently failed at step 2 with `'str' object has no attribute 'get'`. The actual call site (`amplifier-app-cli/lib/merge_utils.py:62` — fixed in microsoft/amplifier-app-cli#169) was 7 stack frames downstream from `_execute_recipe`. We were unable to find it from `ToolResult.error` alone, despite five iterations of progressively-deeper instrumentation. The bug was eventually surfaced only by a manual interactive `amplifier tool invoke recipes` invocation with the outer except patched to call `traceback.format_exc()`.

## Proposed fix

Preserve the traceback in the error structure:

```python
import traceback

except Exception as e:
    return ToolResult(
        success=False,
        error={
            "message": f"Recipe execution failed: {str(e)}",
            "type": type(e).__name__,
            "traceback": traceback.format_exc(),  # NEW
        },
    )
```

The same pattern applies to the corresponding handler in `_resume_recipe`. Both should preserve the traceback so callers can surface it (in logs, in failure envelopes, in UI).

## Risk assessment

- **Compatibility**: existing callers that read `ToolResult.error.message` are unaffected. The new `traceback` field is additive.
- **Size**: tracebacks for typical recipe failures are 500-3000 chars. Negligible for `ToolResult` payloads which already carry agent prompts and outputs in the kilobyte range.
- **Privacy**: tracebacks may include local file paths (no different from any Python exception). If callers need to redact, they can do so on the `traceback` field specifically.

## Suggested test

```python
@pytest.mark.asyncio
async def test_recipe_execution_failure_preserves_traceback():
    """Regression: ToolResult.error must carry a 'traceback' field on failure
    so callers can diagnose without rerunning under a debugger."""
    # Arrange a recipe that will deterministically fail at step dispatch
    # (e.g., reference an agent that doesn't exist)
    ...
    result = await tool.execute(...)
    assert result.success is False
    assert "traceback" in result.error
    assert "File" in result.error["traceback"]
    assert len(result.error["traceback"]) > 50
```

## Discovered while

Running the `amplifier-bundle-reality-check` recipe inside an Amplifier-Resolve reality-check runner sub-container. The downstream consumer (the runner) had to reverse-engineer what failed inside the recipe by reading events.jsonl post-mortem; preserving the traceback in `ToolResult.error` would have eliminated that entire debugging path.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observability: _execute_recipe drops tracebacks; callers see only str(e) and exception class #274

Summary

Real-world impact

Proposed fix

Risk assessment

Suggested test

Discovered while

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Observability: _execute_recipe drops tracebacks; callers see only str(e) and exception class #274

Description

Summary

Real-world impact

Proposed fix

Risk assessment

Suggested test

Discovered while

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions