Optional LangFuse callback for LLM call observability (opt-in)

## Context

ml-intern's observability pipeline today writes `Event`s (`llm_call`, `hf_job_*`, `sandbox_*`, …) to a Hugging Face Dataset, with cost/usage extracted from litellm responses (`agent/core/telemetry.py`). PR #105 set this up, PR #179 closed the cost-attribution gap by tagging the 5 LLM call sites (`main` / `research` / `compaction` / `effort_probe` / `restore`).

This is great for the hosted analytics + SFT use case. It's less convenient when:

- **Self-hosting** — operators running ml-intern on their own infra often already have a self-hosted LangFuse / Phoenix / Langsmith and would like LLM traces to land there with no extra plumbing.
- **SaaS LangFuse users** — teams already on cloud.langfuse.com for their other LLM apps want the same single pane of glass for ml-intern.
- **Local development** — seeing one LLM turn's prompts / tool calls / response in a UI immediately, without round-tripping through the HF Dataset → DuckDB pipeline.

## Proposal

Expose litellm's native LangFuse callback as an **opt-in** integration, gated on **all three** environment variables being set — including `LANGFUSE_HOST`, so the destination is always an explicit choice (no silent fallback to litellm's SaaS default).

Concretely, somewhere during agent startup (e.g. `agent/config.py` or `agent/core/agent_loop.py`):

```python
import os, litellm

if (
    os.getenv("LANGFUSE_HOST")           # destination must be explicit — no silent fallback
    and os.getenv("LANGFUSE_PUBLIC_KEY")
    and os.getenv("LANGFUSE_SECRET_KEY")
):
    if "langfuse" not in litellm.success_callback:
        litellm.success_callback.append("langfuse")
    if "langfuse" not in litellm.failure_callback:
        litellm.failure_callback.append("langfuse")
```

Both deployment shapes are supported:

- **Self-host:** `LANGFUSE_HOST=https://langfuse.internal.example.com`
- **SaaS:** `LANGFUSE_HOST=https://cloud.langfuse.com` (or `https://us.cloud.langfuse.com`)

The existing `kind` tag from PR #179 can be forwarded as `metadata={"tags": [kind], "trace_user_id": session.user_id}` at each call site for free filtering in the LangFuse UI.

## Privacy

The litellm → LangFuse callback ships **the full payload** of every LLM call to whatever `LANGFUSE_HOST` points at:

- system prompts (including ml-intern's own)
- user messages and conversation history
- tool definitions, tool call arguments, tool results (HF Jobs scripts, sandbox stdout, …)
- model completions
- token usage and per-call cost in USD

The destination is the operator's responsibility. The integration intentionally **requires `LANGFUSE_HOST` to be set explicitly** so this is a conscious decision: pointing it at a self-hosted instance keeps data in-house; pointing it at `cloud.langfuse.com` means accepting third-party data residency. The gate prevents the failure mode where someone sets just the keys and silently exfiltrates prompts to litellm's default SaaS endpoint.

The README note should spell this out so operators don't have to reverse-engineer where their prompts go.

## Scope

**In scope**
- Env-gated wiring of litellm → LangFuse callback (with `LANGFUSE_HOST` mandatory).
- Forwarding the existing `kind` tag + `session.user_id` as LangFuse metadata.
- Doc note in `agent/README.md` covering both self-host and SaaS deployment + privacy implications.

**Out of scope**
- Replacing or weakening the HF Dataset pipeline — both run side by side.
- Making `langfuse` a hard dependency. litellm pulls it lazily when the callback is enabled.
- Other backends (Phoenix / Langsmith). Same shape, separate request if this lands.

## Alternative

If env-gated callbacks feel too magical, a `--observability-backend langfuse` CLI flag (or `agent.config.observability_backend`) is fine — same wiring, more explicit. The `LANGFUSE_HOST` requirement still applies.

Happy to send a PR if there's interest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional LangFuse callback for LLM call observability (opt-in) #196

Context

Proposal

Privacy

Scope

Alternative

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Optional LangFuse callback for LLM call observability (opt-in) #196

Description

Context

Proposal

Privacy

Scope

Alternative

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions