Skip to content

Observability

Enreign edited this page Mar 13, 2026 · 2 revisions

Observability

Sparks's observability stack has three layers: a real-time 20-type event stream (emitted via Unix domain socket), Langfuse distributed tracing for every LLM call and tool execution, and structured KPI snapshots segmented by lane, repo, and risk tier. A doctor command provides diagnostic funnels. An HTML dashboard visualizes the local SQLite data.


Event Stream

Sparks emits structured events via a Unix domain socket. All 20 event types are CI-enforced to have at least one emit site.

Streaming Events

cargo run --quiet -- observe

This tails the event stream in real time, printing each event as JSON.

Event Types

Category Event Types
System Startup, Heartbeat, SelfMetrics
Tasks AutonomousTask, ToolUsage, PulseEmitted
State MoodChange, EnergyShift, GhostSelected
Monitoring CiMonitor, KpiSnapshot, MemoryStored
Intake TicketReceived, TicketDispatched, TicketSynced
Alerts PromptFlagged, LoopGuardTripped, RollbackTripped
Misc SessionActivity, ObserverConnected, ObserverDisconnected

All events are broadcast to connected UDS listeners. The HTML dashboard consumes events from the SQLite log.


Langfuse Tracing

Every LLM call, tool execution, and background task pipeline produces traces, spans, and generation metadata in Langfuse.

Setup

export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_SECRET_KEY=sk-lf-...
export LANGFUSE_BASE_URL=https://cloud.langfuse.com

Or in config.toml:

[langfuse]
enabled = true

No errors are emitted if Langfuse env vars are absent — tracing is simply skipped.

What is Traced

  • Every LLM call (model, prompt, completion, latency, token counts)
  • Tool calls (name, input, output, duration)
  • Task pipeline phases (EXPLORE / EXECUTE / VERIFY / HEAL)
  • Background task dispatches
  • Classification decisions

KPI Tracking

Sparks tracks outcome metrics for every task, segmented by:

  • Lanedelivery (external task completion) vs self-improvement (optimizer, eval harness)
  • Repository — per-repo success/failure rates
  • Risk tierlow / medium / high
  • Ghost — per-ghost performance

Metrics

Metric Description
Task success rate % of tasks completing without rollback
Verification pass rate % of tasks passing VERIFY phase
Rollback rate % of tasks triggering git rollback
Mean time to fix Average time from failure detection to successful HEAL

View KPI Snapshots

cargo run --quiet -- kpi snapshot --lane delivery
cargo run --quiet -- kpi snapshot --lane self-improvement
cargo run --quiet -- kpi snapshot --repo emberloom/sparks

doctor Command

The doctor command runs diagnostic funnels and reports health.

# Full check (requires LLM connectivity)
cargo run --quiet -- doctor

# Skip LLM check (fast local check)
cargo run --quiet -- doctor --skip-llm

# Print security attestation
cargo run --quiet -- doctor --security

# CI mode (non-interactive, exit code on failure)
cargo run --quiet -- doctor --ci

Diagnostic Funnels

Funnel Checks
LLM Provider connectivity, model availability, response latency
Proactive Feature wiring, cron engine, pulse bus health
Memory ONNX model presence, SQLite DB write, HNSW index init
Execution Docker daemon reachable, ghost image available, socket accessible

Self-Metrics Introspection

Sparks collects process-level metrics at runtime:

Metric Collected
RSS memory Yes
CPU usage Yes
LLM call latency Yes
Error rate Yes

Anomaly detection runs on these metrics and emits SelfMetrics events when thresholds are exceeded.


HTML Dashboard

Generate a self-contained dashboard from the local SQLite DB:

cargo run --quiet -- dashboard --output-format html
# or
python3 scripts/eval_dashboard.py

The dashboard shows:

  • Task timeline and outcome history
  • KPI trends by lane and ghost
  • Memory growth
  • Event stream summary

Session Review (Telegram)

If running with the Telegram frontend, session activity is logged and accessible via:

Command Description
/review Replay recent session activity
/explain Explain a specific task outcome
/watch Subscribe to live task progress
/search Search activity log by keyword
/alerts View pattern-based alerts

See docs/session-review-explainability.md for full documentation.


Relevant Source Files

  • src/observer.rs — ObserverHandle, 20-type event enum, UDS broadcast
  • src/langfuse.rs — Langfuse tracer integration
  • src/kpi.rs — KPI store, snapshot generation
  • src/doctor.rs — diagnostic funnel implementation
  • src/introspect.rs — self-metrics collection and anomaly detection
  • src/session_review.rs — activity log persistence
  • scripts/eval_dashboard.py — HTML dashboard generator

Clone this wiki locally