Skip to content

feat: add opt-in OpenTelemetry observability + Grafana/Prometheus/Loki demo stack#382

Open
spacepirate0001 wants to merge 1 commit intosipeed:mainfrom
spacepirate0001:feat/opentelemetry-observability-for-picoclaw
Open

feat: add opt-in OpenTelemetry observability + Grafana/Prometheus/Loki demo stack#382
spacepirate0001 wants to merge 1 commit intosipeed:mainfrom
spacepirate0001:feat/opentelemetry-observability-for-picoclaw

Conversation

@spacepirate0001
Copy link

@spacepirate0001 spacepirate0001 commented Feb 17, 2026

📝 Description

Adds lightweight, opt-in OpenTelemetry instrumentation for PicoClaw and a local observability demo stack (OTel Collector + Prometheus + Grafana + Loki + Promtail) for development/testing.

Grafana_PicoClaw_OTEL

🔮 Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • 🧹 Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👤 Mostly Human-written (Human lead, AI assisted or none)

🔗 Linked Issue

N/A

🧰 Technical Context (Skip for Docs)

Reference: #255
Reasoning: Add runtime visibility with minimal default footprint and opt-in behavior, aligned with PicoClaw’s lightweight philosophy.

What changed

  • Added opt-in observability config:
    • observability.enabled
    • observability.service_name
    • observability.otlp_endpoint
    • observability.insecure
    • observability.sample_ratio
  • Added OTEL bootstrap package:
    • pkg/observability/otel.go
  • Wired OTEL init into runtime startup:
    • agent and gateway command paths
  • Added tracing spans:
    • agent.process_message
    • agent.tool_call
  • Added tool error span status handling:
    • mark span as ERROR when Err != nil or IsError == true
  • Fixed provider env override behavior for shared provider config:
    • explicit env-to-provider mapping in config load path
  • Added/updated docs and env examples for observability
  • Added docker-compose observability demo stack:
    • OpenTelemetry Collector
    • Prometheus
    • Grafana
    • Loki
    • Promtail
  • Updated collector config to:
    • bind OTLP on 0.0.0.0
    • enable spanmetrics connector and Prometheus export
  • Updated dashboard queries to match live metric names:
    • traces_span_metrics_*

🧪 Test Environment & Hardware

  • Hardware: PC
  • OS: Ubuntu host + Docker Desktop Linux containers
  • Model/Provider: OpenAI (gpt-4o-mini) for validation
  • Channels: CLI / gateway runtime

📷 Proof of Work (Optional for Docs)

Validation summary
  • Verified OTEL init log on startup.
  • Verified trace ingestion in collector logs (Traces ...).
  • Verified spanmetrics in Prometheus (traces_span_metrics_*).
  • Verified Grafana metrics panels populate with corrected queries.
  • Verified error status path appears after forced tool failure (missing file read).

✅ Checklist

  • My code/docs follow the style of this project.
  • I have performed a self-review of my own changes.
  • I have updated the documentation accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant