Skip to content

Optional LangFuse callback for LLM call observability (opt-in) #196

@taxfree-python

Description

@taxfree-python

Context

ml-intern's observability pipeline today writes Events (llm_call, hf_job_*, sandbox_*, …) to a Hugging Face Dataset, with cost/usage extracted from litellm responses (agent/core/telemetry.py). PR #105 set this up, PR #179 closed the cost-attribution gap by tagging the 5 LLM call sites (main / research / compaction / effort_probe / restore).

This is great for the hosted analytics + SFT use case. It's less convenient when:

  • Self-hosting — operators running ml-intern on their own infra often already have a self-hosted LangFuse / Phoenix / Langsmith and would like LLM traces to land there with no extra plumbing.
  • SaaS LangFuse users — teams already on cloud.langfuse.com for their other LLM apps want the same single pane of glass for ml-intern.
  • Local development — seeing one LLM turn's prompts / tool calls / response in a UI immediately, without round-tripping through the HF Dataset → DuckDB pipeline.

Proposal

Expose litellm's native LangFuse callback as an opt-in integration, gated on all three environment variables being set — including LANGFUSE_HOST, so the destination is always an explicit choice (no silent fallback to litellm's SaaS default).

Concretely, somewhere during agent startup (e.g. agent/config.py or agent/core/agent_loop.py):

import os, litellm

if (
    os.getenv("LANGFUSE_HOST")           # destination must be explicit — no silent fallback
    and os.getenv("LANGFUSE_PUBLIC_KEY")
    and os.getenv("LANGFUSE_SECRET_KEY")
):
    if "langfuse" not in litellm.success_callback:
        litellm.success_callback.append("langfuse")
    if "langfuse" not in litellm.failure_callback:
        litellm.failure_callback.append("langfuse")

Both deployment shapes are supported:

  • Self-host: LANGFUSE_HOST=https://langfuse.internal.example.com
  • SaaS: LANGFUSE_HOST=https://cloud.langfuse.com (or https://us.cloud.langfuse.com)

The existing kind tag from PR #179 can be forwarded as metadata={"tags": [kind], "trace_user_id": session.user_id} at each call site for free filtering in the LangFuse UI.

Privacy

The litellm → LangFuse callback ships the full payload of every LLM call to whatever LANGFUSE_HOST points at:

  • system prompts (including ml-intern's own)
  • user messages and conversation history
  • tool definitions, tool call arguments, tool results (HF Jobs scripts, sandbox stdout, …)
  • model completions
  • token usage and per-call cost in USD

The destination is the operator's responsibility. The integration intentionally requires LANGFUSE_HOST to be set explicitly so this is a conscious decision: pointing it at a self-hosted instance keeps data in-house; pointing it at cloud.langfuse.com means accepting third-party data residency. The gate prevents the failure mode where someone sets just the keys and silently exfiltrates prompts to litellm's default SaaS endpoint.

The README note should spell this out so operators don't have to reverse-engineer where their prompts go.

Scope

In scope

  • Env-gated wiring of litellm → LangFuse callback (with LANGFUSE_HOST mandatory).
  • Forwarding the existing kind tag + session.user_id as LangFuse metadata.
  • Doc note in agent/README.md covering both self-host and SaaS deployment + privacy implications.

Out of scope

  • Replacing or weakening the HF Dataset pipeline — both run side by side.
  • Making langfuse a hard dependency. litellm pulls it lazily when the callback is enabled.
  • Other backends (Phoenix / Langsmith). Same shape, separate request if this lands.

Alternative

If env-gated callbacks feel too magical, a --observability-backend langfuse CLI flag (or agent.config.observability_backend) is fine — same wiring, more explicit. The LANGFUSE_HOST requirement still applies.

Happy to send a PR if there's interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions