-
Notifications
You must be signed in to change notification settings - Fork 0
Observability
Sparks's observability stack has three layers: a real-time 20-type event stream (emitted via Unix domain socket), Langfuse distributed tracing for every LLM call and tool execution, and structured KPI snapshots segmented by lane, repo, and risk tier. A doctor command provides diagnostic funnels. An HTML dashboard visualizes the local SQLite data.
Sparks emits structured events via a Unix domain socket. All 20 event types are CI-enforced to have at least one emit site.
cargo run --quiet -- observeThis tails the event stream in real time, printing each event as JSON.
| Category | Event Types |
|---|---|
| System |
Startup, Heartbeat, SelfMetrics
|
| Tasks |
AutonomousTask, ToolUsage, PulseEmitted
|
| State |
MoodChange, EnergyShift, GhostSelected
|
| Monitoring |
CiMonitor, KpiSnapshot, MemoryStored
|
| Intake |
TicketReceived, TicketDispatched, TicketSynced
|
| Alerts |
PromptFlagged, LoopGuardTripped, RollbackTripped
|
| Misc |
SessionActivity, ObserverConnected, ObserverDisconnected
|
All events are broadcast to connected UDS listeners. The HTML dashboard consumes events from the SQLite log.
Every LLM call, tool execution, and background task pipeline produces traces, spans, and generation metadata in Langfuse.
export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_SECRET_KEY=sk-lf-...
export LANGFUSE_BASE_URL=https://cloud.langfuse.comOr in config.toml:
[langfuse]
enabled = trueNo errors are emitted if Langfuse env vars are absent — tracing is simply skipped.
- Every LLM call (model, prompt, completion, latency, token counts)
- Tool calls (name, input, output, duration)
- Task pipeline phases (EXPLORE / EXECUTE / VERIFY / HEAL)
- Background task dispatches
- Classification decisions
Sparks tracks outcome metrics for every task, segmented by:
-
Lane —
delivery(external task completion) vsself-improvement(optimizer, eval harness) - Repository — per-repo success/failure rates
-
Risk tier —
low/medium/high - Ghost — per-ghost performance
| Metric | Description |
|---|---|
| Task success rate | % of tasks completing without rollback |
| Verification pass rate | % of tasks passing VERIFY phase |
| Rollback rate | % of tasks triggering git rollback |
| Mean time to fix | Average time from failure detection to successful HEAL |
cargo run --quiet -- kpi snapshot --lane delivery
cargo run --quiet -- kpi snapshot --lane self-improvement
cargo run --quiet -- kpi snapshot --repo emberloom/sparksThe doctor command runs diagnostic funnels and reports health.
# Full check (requires LLM connectivity)
cargo run --quiet -- doctor
# Skip LLM check (fast local check)
cargo run --quiet -- doctor --skip-llm
# Print security attestation
cargo run --quiet -- doctor --security
# CI mode (non-interactive, exit code on failure)
cargo run --quiet -- doctor --ci| Funnel | Checks |
|---|---|
| LLM | Provider connectivity, model availability, response latency |
| Proactive | Feature wiring, cron engine, pulse bus health |
| Memory | ONNX model presence, SQLite DB write, HNSW index init |
| Execution | Docker daemon reachable, ghost image available, socket accessible |
Sparks collects process-level metrics at runtime:
| Metric | Collected |
|---|---|
| RSS memory | Yes |
| CPU usage | Yes |
| LLM call latency | Yes |
| Error rate | Yes |
Anomaly detection runs on these metrics and emits SelfMetrics events when thresholds are exceeded.
Generate a self-contained dashboard from the local SQLite DB:
cargo run --quiet -- dashboard --output-format html
# or
python3 scripts/eval_dashboard.pyThe dashboard shows:
- Task timeline and outcome history
- KPI trends by lane and ghost
- Memory growth
- Event stream summary
If running with the Telegram frontend, session activity is logged and accessible via:
| Command | Description |
|---|---|
/review |
Replay recent session activity |
/explain |
Explain a specific task outcome |
/watch |
Subscribe to live task progress |
/search |
Search activity log by keyword |
/alerts |
View pattern-based alerts |
See docs/session-review-explainability.md for full documentation.
-
src/observer.rs— ObserverHandle, 20-type event enum, UDS broadcast -
src/langfuse.rs— Langfuse tracer integration -
src/kpi.rs— KPI store, snapshot generation -
src/doctor.rs— diagnostic funnel implementation -
src/introspect.rs— self-metrics collection and anomaly detection -
src/session_review.rs— activity log persistence -
scripts/eval_dashboard.py— HTML dashboard generator