Context
ml-intern's observability pipeline today writes Events (llm_call, hf_job_*, sandbox_*, …) to a Hugging Face Dataset, with cost/usage extracted from litellm responses (agent/core/telemetry.py). PR #105 set this up, PR #179 closed the cost-attribution gap by tagging the 5 LLM call sites (main / research / compaction / effort_probe / restore).
This is great for the hosted analytics + SFT use case. It's less convenient when:
- Self-hosting — operators running ml-intern on their own infra often already have a self-hosted LangFuse / Phoenix / Langsmith and would like LLM traces to land there with no extra plumbing.
- SaaS LangFuse users — teams already on cloud.langfuse.com for their other LLM apps want the same single pane of glass for ml-intern.
- Local development — seeing one LLM turn's prompts / tool calls / response in a UI immediately, without round-tripping through the HF Dataset → DuckDB pipeline.
Proposal
Expose litellm's native LangFuse callback as an opt-in integration, gated on all three environment variables being set — including LANGFUSE_HOST, so the destination is always an explicit choice (no silent fallback to litellm's SaaS default).
Concretely, somewhere during agent startup (e.g. agent/config.py or agent/core/agent_loop.py):
import os, litellm
if (
os.getenv("LANGFUSE_HOST") # destination must be explicit — no silent fallback
and os.getenv("LANGFUSE_PUBLIC_KEY")
and os.getenv("LANGFUSE_SECRET_KEY")
):
if "langfuse" not in litellm.success_callback:
litellm.success_callback.append("langfuse")
if "langfuse" not in litellm.failure_callback:
litellm.failure_callback.append("langfuse")
Both deployment shapes are supported:
- Self-host:
LANGFUSE_HOST=https://langfuse.internal.example.com
- SaaS:
LANGFUSE_HOST=https://cloud.langfuse.com (or https://us.cloud.langfuse.com)
The existing kind tag from PR #179 can be forwarded as metadata={"tags": [kind], "trace_user_id": session.user_id} at each call site for free filtering in the LangFuse UI.
Privacy
The litellm → LangFuse callback ships the full payload of every LLM call to whatever LANGFUSE_HOST points at:
- system prompts (including ml-intern's own)
- user messages and conversation history
- tool definitions, tool call arguments, tool results (HF Jobs scripts, sandbox stdout, …)
- model completions
- token usage and per-call cost in USD
The destination is the operator's responsibility. The integration intentionally requires LANGFUSE_HOST to be set explicitly so this is a conscious decision: pointing it at a self-hosted instance keeps data in-house; pointing it at cloud.langfuse.com means accepting third-party data residency. The gate prevents the failure mode where someone sets just the keys and silently exfiltrates prompts to litellm's default SaaS endpoint.
The README note should spell this out so operators don't have to reverse-engineer where their prompts go.
Scope
In scope
- Env-gated wiring of litellm → LangFuse callback (with
LANGFUSE_HOST mandatory).
- Forwarding the existing
kind tag + session.user_id as LangFuse metadata.
- Doc note in
agent/README.md covering both self-host and SaaS deployment + privacy implications.
Out of scope
- Replacing or weakening the HF Dataset pipeline — both run side by side.
- Making
langfuse a hard dependency. litellm pulls it lazily when the callback is enabled.
- Other backends (Phoenix / Langsmith). Same shape, separate request if this lands.
Alternative
If env-gated callbacks feel too magical, a --observability-backend langfuse CLI flag (or agent.config.observability_backend) is fine — same wiring, more explicit. The LANGFUSE_HOST requirement still applies.
Happy to send a PR if there's interest.
Context
ml-intern's observability pipeline today writes
Events (llm_call,hf_job_*,sandbox_*, …) to a Hugging Face Dataset, with cost/usage extracted from litellm responses (agent/core/telemetry.py). PR #105 set this up, PR #179 closed the cost-attribution gap by tagging the 5 LLM call sites (main/research/compaction/effort_probe/restore).This is great for the hosted analytics + SFT use case. It's less convenient when:
Proposal
Expose litellm's native LangFuse callback as an opt-in integration, gated on all three environment variables being set — including
LANGFUSE_HOST, so the destination is always an explicit choice (no silent fallback to litellm's SaaS default).Concretely, somewhere during agent startup (e.g.
agent/config.pyoragent/core/agent_loop.py):Both deployment shapes are supported:
LANGFUSE_HOST=https://langfuse.internal.example.comLANGFUSE_HOST=https://cloud.langfuse.com(orhttps://us.cloud.langfuse.com)The existing
kindtag from PR #179 can be forwarded asmetadata={"tags": [kind], "trace_user_id": session.user_id}at each call site for free filtering in the LangFuse UI.Privacy
The litellm → LangFuse callback ships the full payload of every LLM call to whatever
LANGFUSE_HOSTpoints at:The destination is the operator's responsibility. The integration intentionally requires
LANGFUSE_HOSTto be set explicitly so this is a conscious decision: pointing it at a self-hosted instance keeps data in-house; pointing it atcloud.langfuse.commeans accepting third-party data residency. The gate prevents the failure mode where someone sets just the keys and silently exfiltrates prompts to litellm's default SaaS endpoint.The README note should spell this out so operators don't have to reverse-engineer where their prompts go.
Scope
In scope
LANGFUSE_HOSTmandatory).kindtag +session.user_idas LangFuse metadata.agent/README.mdcovering both self-host and SaaS deployment + privacy implications.Out of scope
langfusea hard dependency. litellm pulls it lazily when the callback is enabled.Alternative
If env-gated callbacks feel too magical, a
--observability-backend langfuseCLI flag (oragent.config.observability_backend) is fine — same wiring, more explicit. TheLANGFUSE_HOSTrequirement still applies.Happy to send a PR if there's interest.