Observability

Sparks's observability stack has three layers: a real-time 20-type event stream (emitted via Unix domain socket), Langfuse distributed tracing for every LLM call and tool execution, and structured KPI snapshots segmented by lane, repo, and risk tier. A doctor command provides diagnostic funnels. An HTML dashboard visualizes the local SQLite data.

Event Stream

Sparks emits structured events via a Unix domain socket. All 20 event types are CI-enforced to have at least one emit site.

Streaming Events

cargo run --quiet -- observe

This tails the event stream in real time, printing each event as JSON.

Event Types

Category	Event Types
System	`Startup`, `Heartbeat`, `SelfMetrics`
Tasks	`AutonomousTask`, `ToolUsage`, `PulseEmitted`
State	`MoodChange`, `EnergyShift`, `GhostSelected`
Monitoring	`CiMonitor`, `KpiSnapshot`, `MemoryStored`
Intake	`TicketReceived`, `TicketDispatched`, `TicketSynced`
Alerts	`PromptFlagged`, `LoopGuardTripped`, `RollbackTripped`
Misc	`SessionActivity`, `ObserverConnected`, `ObserverDisconnected`

All events are broadcast to connected UDS listeners. The HTML dashboard consumes events from the SQLite log.

Langfuse Tracing

Every LLM call, tool execution, and background task pipeline produces traces, spans, and generation metadata in Langfuse.

Setup

export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_SECRET_KEY=sk-lf-...
export LANGFUSE_BASE_URL=https://cloud.langfuse.com

Or in config.toml:

[langfuse]
enabled = true

No errors are emitted if Langfuse env vars are absent — tracing is simply skipped.

What is Traced

Every LLM call (model, prompt, completion, latency, token counts)
Tool calls (name, input, output, duration)
Task pipeline phases (EXPLORE / EXECUTE / VERIFY / HEAL)
Background task dispatches
Classification decisions

KPI Tracking

Sparks tracks outcome metrics for every task, segmented by:

Lane — delivery (external task completion) vs self-improvement (optimizer, eval harness)
Repository — per-repo success/failure rates
Risk tier — low / medium / high
Ghost — per-ghost performance

Metrics

Metric	Description
Task success rate	% of tasks completing without rollback
Verification pass rate	% of tasks passing VERIFY phase
Rollback rate	% of tasks triggering git rollback
Mean time to fix	Average time from failure detection to successful HEAL

View KPI Snapshots

cargo run --quiet -- kpi snapshot --lane delivery
cargo run --quiet -- kpi snapshot --lane self-improvement
cargo run --quiet -- kpi snapshot --repo emberloom/sparks

`doctor` Command

The doctor command runs diagnostic funnels and reports health.

# Full check (requires LLM connectivity)
cargo run --quiet -- doctor

# Skip LLM check (fast local check)
cargo run --quiet -- doctor --skip-llm

# Print security attestation
cargo run --quiet -- doctor --security

# CI mode (non-interactive, exit code on failure)
cargo run --quiet -- doctor --ci

Diagnostic Funnels

Funnel	Checks
LLM	Provider connectivity, model availability, response latency
Proactive	Feature wiring, cron engine, pulse bus health
Memory	ONNX model presence, SQLite DB write, HNSW index init
Execution	Docker daemon reachable, ghost image available, socket accessible

Self-Metrics Introspection

Sparks collects process-level metrics at runtime:

Metric	Collected
RSS memory	Yes
CPU usage	Yes
LLM call latency	Yes
Error rate	Yes

Anomaly detection runs on these metrics and emits SelfMetrics events when thresholds are exceeded.

HTML Dashboard

Generate a self-contained dashboard from the local SQLite DB:

cargo run --quiet -- dashboard --output-format html
# or
python3 scripts/eval_dashboard.py

The dashboard shows:

Task timeline and outcome history
KPI trends by lane and ghost
Memory growth
Event stream summary

Session Review (Telegram)

If running with the Telegram frontend, session activity is logged and accessible via:

Command	Description
`/review`	Replay recent session activity
`/explain`	Explain a specific task outcome
`/watch`	Subscribe to live task progress
`/search`	Search activity log by keyword
`/alerts`	View pattern-based alerts

See docs/session-review-explainability.md for full documentation.

Relevant Source Files

src/observer.rs — ObserverHandle, 20-type event enum, UDS broadcast
src/langfuse.rs — Langfuse tracer integration
src/kpi.rs — KPI store, snapshot generation
src/doctor.rs — diagnostic funnel implementation
src/introspect.rs — self-metrics collection and anomaly detection
src/session_review.rs — activity log persistence
scripts/eval_dashboard.py — HTML dashboard generator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observability

Observability

Event Stream

Streaming Events

Event Types

Langfuse Tracing

Setup

What is Traced

KPI Tracking

Metrics

View KPI Snapshots

`doctor` Command

Diagnostic Funnels

Self-Metrics Introspection

HTML Dashboard

Session Review (Telegram)

Relevant Source Files

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Observability

Observability

Event Stream

Streaming Events

Event Types

Langfuse Tracing

Setup

What is Traced

KPI Tracking

Metrics

View KPI Snapshots

doctor Command

Diagnostic Funnels

Self-Metrics Introspection

HTML Dashboard

Session Review (Telegram)

Relevant Source Files

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`doctor` Command