Real-time monitoring, alerting, and observability for AI agents.
Know when your agent breaks — before your users do.
Installation · Dashboard · Monitoring · Alert Rules · Configuration · Roadmap
AI agents fail silently. An LLM provider starts returning errors, a session gets stuck in a loop, your gateway goes down — and nobody knows until a user complains. OpenAlerts is a monitoring plugin for OpenClaw that watches for these failure patterns and alerts you through your existing messaging channels the moment something goes wrong.
Key features:
- 7 built-in alert rules covering LLM failures, stuck sessions, infrastructure errors, and more
- Real-time web dashboard with live event stream, system logs, and health overview
- Automatic event capture — 19 event sources, zero manual instrumentation
- Alerts delivered through OpenClaw's own channels (Telegram, Discord, Slack, etc.) — no extra bot tokens
- Zero-LLM-token chat commands:
/health,/alerts,/dashboard - Framework-agnostic core engine (future plugins for Nanobot, CrewAI, and more)
OpenClaw ships with structured log files and diagnostic events. That's great for debugging after the fact. OpenAlerts adds the layer that's missing:
| Capability | OpenClaw Built-in | OpenAlerts Plugin |
|---|---|---|
| Event capture | Log to file, read later | Real-time event bus with SSE streaming |
| Alerting | None — check logs manually | Automatic alerts through your existing bot |
| Alert rules | None | 7 rules with sliding windows, cooldowns, and thresholds |
| Dashboard | None | Live web UI with Activity, Logs, and Health tabs |
| Session tracking | Per-log-entry | Grouped session flows — every step in one place |
| Failure detection | Manual log review | Detects stuck sessions, LLM errors, gateway down automatically |
OpenAlerts doesn't replace your logs — it watches them (and 18 other event sources) and tells you when something needs attention.
If you don't have OpenClaw yet, install it first:
# Install OpenClaw
npm install -g openclaw@latest
# Run the interactive setup wizard
openclaw onboard --install-daemonThis configures your gateway, workspace, model credentials, and messaging channels. See the OpenClaw getting started guide for details.
Verify your gateway is running:
openclaw gateway status
openclaw doctor # diagnose & auto-fix config issues# Clone the repo
git clone https://github.com/steadwing/openalerts.git
cd openalerts
# Install dependencies and build
npm install
npm run build
# Install into OpenClaw
openclaw plugins install ./packages/alertOr for local development, use link mode (symlinks instead of copying):
openclaw plugins install -l ./packages/alertAlternatively, point to the path directly in openclaw.json:
{
"plugins": {
"load": {
"paths": ["/path/to/openalerts/packages/alert"]
}
}
}Add alert settings under plugins.entries.openalerts.config in your openclaw.json:
Alerts are delivered through OpenClaw's own channel system — the same bot that handles your conversations sends the alerts. No separate bot tokens or webhook URLs to configure.
# Restart the gateway to load the plugin
openclaw gateway stop && openclaw gateway run
# Check it loaded
openclaw plugins list
# Should show: openalerts (OpenAlerts) — enabledSend /health to your bot — you should get a live status summary back (zero LLM tokens consumed).
Open the dashboard at http://127.0.0.1:18789/openalerts
The dashboard is a real-time web interface embedded directly in the OpenClaw gateway. No separate process, no external dependencies.
Access: http://127.0.0.1:18789/openalerts (or send /dashboard to your bot for the link)
The default view. A live event timeline showing everything happening inside your agent:
- Session flows — Events grouped by session ID into collapsible flows. Each flow shows event count, duration, token usage, tool calls, and status (active / completed / error).
- Event detail — Every event shows type, outcome, duration, and relevant metadata (tool name, model, queue depth, token count, error messages).
- Color coding — LLM (blue), Tool (purple), Agent (green), Session (orange), Infrastructure (red).
- Real-time streaming — Events appear instantly via Server-Sent Events. No polling needed.
- Log integration — Internal OpenClaw log entries (session state changes, run registrations, prompt execution, hook activity) stream alongside engine events for full visibility.
A filtered, structured view of OpenClaw's internal log file:
- Filters — By subsystem (agent, session, gateway, etc.), log level (DEBUG, INFO, WARN, ERROR), or free-text search.
- Parsed key-value pairs — Extracts
sessionId,runId,durationMs, queue depth, state transitions from log entries. - Auto-refresh — New log entries appear every 3 seconds.
System-wide health overview:
- Status cards — Uptime, active SSE listeners, total errors, platform status.
- Stats table — Messages processed, LLM errors, webhook errors, tool calls, agent starts, sessions.
- Rule status — All 7 alert rules with live OK/FIRING indicators.
- Recent alerts — Last 20 alerts with severity, title, detail, and timestamp.
The dashboard exposes endpoints you can also query programmatically:
| Endpoint | Method | Description |
|---|---|---|
/openalerts |
GET | Dashboard HTML page |
/openalerts/events |
GET | SSE stream (event: openalerts for engine events, event: oclog for log entries) |
/openalerts/state |
GET | JSON snapshot (stats, uptime, alerts, rule status, cooldowns) |
/openalerts/logs |
GET | Parsed log entries (?limit=200&after=<timestamp>) |
The plugin automatically captures 19 event sources with zero manual instrumentation:
12 Diagnostic Events (via OpenClaw's onDiagnosticEvent):
| OpenClaw Event | Mapped To | What It Captures |
|---|---|---|
message.processed |
llm.call |
LLM call outcome, duration, errors |
message.queued |
infra.queue_depth |
Message entering the queue |
model.usage |
llm.token_usage |
Token counts and cost |
session.stuck |
session.stuck |
Session age exceeding threshold |
session.state |
session.start / session.end |
Session lifecycle transitions |
diagnostic.heartbeat |
infra.heartbeat |
Health check with queue depth |
webhook.error |
infra.error |
Inbound webhook failures |
webhook.received |
custom |
Webhook arrival (informational) |
webhook.processed |
custom |
Webhook processing complete |
queue.lane.enqueue |
infra.queue_depth |
Per-lane queue tracking |
queue.lane.dequeue |
infra.queue_depth |
Dequeue with wait time |
heartbeat |
infra.heartbeat |
Periodic heartbeat |
7 Plugin Hooks (via OpenClaw's api.on()):
| Hook | Mapped To | What It Captures |
|---|---|---|
after_tool_call |
tool.call / tool.error |
Tool name, duration, parameters, errors |
before_agent_start |
agent.start |
Agent initialization |
agent_end |
agent.end / agent.error |
Agent completion, message count, errors |
session_start |
session.start |
New or resumed session |
session_end |
session.end |
Session close with duration and message count |
message_sent |
custom / infra.error |
Message delivery success or failure |
gateway_start / gateway_stop |
infra.heartbeat / infra.error |
Gateway lifecycle |
Seven rules are evaluated in real-time against every incoming event:
| Rule | Condition | Default Threshold | Cooldown | Severity |
|---|---|---|---|---|
| infra-errors | Infrastructure errors in a 5-minute sliding window | 3 errors | 15 min | ERROR |
| llm-errors | LLM call failures in a 5-minute sliding window | 3 errors | 15 min | ERROR |
| session-stuck | Session not progressing for too long | 120 seconds | 30 min | WARN |
| heartbeat-fail | Consecutive heartbeat delivery failures | 3 consecutive | 30 min | ERROR |
| queue-depth | Items queued for processing exceed threshold | 10 items | 15 min | WARN |
| high-error-rate | Percentage of failed messages in the last 20 | 50% | 30 min | ERROR |
| gateway-down | No heartbeat received (watchdog timer, 30s ticks) | 90 seconds | 60 min | CRITICAL |
- An event arrives (via diagnostic event or plugin hook — captured automatically)
- The evaluator runs all 7 rules against the event
- If a rule fires and its cooldown has elapsed, an
AlertEventis created - The alert is sent through OpenClaw's channel system (same bot, same conversation)
- The alert is persisted to the event store for history
- A global cap of 5 alerts per hour prevents alert storms
Every threshold, cooldown, and enabled state is configurable per-rule.
Three commands are available in any connected channel (Telegram, Discord, Slack, etc.):
| Command | Description |
|---|---|
/health |
System health snapshot: uptime, active alerts, channel activity, stats |
/alerts |
Recent alert history (last 100 alerts with severity, title, detail, timestamps) |
/dashboard |
Returns the dashboard URL for quick access |
All commands are handled directly by the plugin — zero LLM tokens consumed.
Full configuration reference under plugins.entries.openalerts.config in openclaw.json:
{
"plugins": {
"entries": {
"openalerts": {
"enabled": true,
"config": {
// Where to send alerts (uses OpenClaw's own channel system)
"alertChannel": "telegram", // telegram | discord | slack | whatsapp | signal
"alertTo": "YOUR_CHAT_ID",
// Global settings
"cooldownMinutes": 15, // Default cooldown between repeated alerts
"quiet": false, // Suppress non-critical alerts
"maxLogSizeKb": 512, // Max event log size before pruning
"maxLogAgeDays": 7, // Max event log age before pruning
// Per-rule overrides
"rules": {
"gateway-down": {
"threshold": 120000, // Increase timeout to 2 minutes
},
"high-error-rate": {
"enabled": false, // Disable this rule entirely
},
"llm-errors": {
"threshold": 5, // Require 5 errors instead of 3
"cooldownMinutes": 30, // Longer cooldown
},
},
},
},
},
},
}OpenAlerts is a monorepo with a shared core engine and framework-specific plugin adapters:
@steadwing/openalerts-core Zero external dependencies
| Rules engine, evaluator, event bus, state store, formatter
|
@steadwing/openalerts OpenClaw plugin
Translates 12 diagnostic events + 7 plugin hooks
Routes alerts through OpenClaw's channel system
Serves the live monitoring dashboard
The core package is framework-agnostic — it contains the rules engine, evaluator, event bus, alert dispatcher, and state persistence. It has zero external dependencies and can run anywhere.
The alert package is the OpenClaw-specific adapter. It subscribes to OpenClaw's diagnostic events and plugin hooks, translates them into universal event objects, and routes alerts through OpenClaw's existing messaging channels. It also serves the embedded web dashboard.
This separation means adding monitoring support for another framework (Nanobot, CrewAI, LangChain, etc.) only requires writing a new adapter package — the core engine stays the same.
OpenAlerts persists all events and alerts to a JSONL file on disk:
- Location:
<stateDir>/events.jsonl(managed by OpenClaw's plugin state directory) - Warm-start: On engine start, the event log is loaded to restore cooldown timers and window states. No alert is re-fired for events that were already processed.
- Auto-pruning: Every 6 hours, entries older than 7 days or exceeding 512KB are removed.
- Crash recovery: Events are appended immediately, so the engine recovers its full state after any restart.
openalerts/
packages/
core/ @steadwing/openalerts-core Shared engine (zero dependencies)
src/
engine.ts OpenAlertsEngine (start, stop, ingest, bus, state)
rules.ts 7 built-in alert rule definitions
evaluator.ts processEvent(), warmFromHistory(), processWatchdogTick()
event-bus.ts Pub/sub event distribution
alert-channel.ts AlertDispatcher (parallel delivery, failure isolation)
formatter.ts Alert message formatting, /health and /alerts output
store.ts JSONL append, read, prune
platform.ts Optional cloud sync
types.ts All TypeScript type definitions
alert/ @steadwing/openalerts OpenClaw plugin
index.ts Plugin entry — service registration, hook wiring
src/
adapter.ts Event translation (19 sources), OpenClawAlertChannel
commands.ts /health, /alerts, /dashboard command handlers
dashboard-html.ts Embedded web UI (Activity, Logs, Health tabs)
dashboard-routes.ts HTTP handler, SSE streaming, log tailing
npm install # Install all workspace dependencies
npm run typecheck # Type-check all packages
npm run build # Build: core → alert
npm run clean # Remove all dist/ directoriesThe monorepo uses npm workspaces. Build order matters — core must build before alert. The build script handles this automatically.
Plugin refinements:
- Richer event detail cards in the dashboard — latency histograms, token usage graphs
- Custom rule definitions — define your own alert rules with threshold/window/cooldown
- Alert escalation policies — severity-based routing (CRITICAL to phone, WARN to Slack)
- Rate limiting & circuit breaker detection rules
New framework plugins:
-
@steadwing/openalerts-nanobot— Nanobot adapter -
@steadwing/openalerts-crewai— CrewAI adapter -
@steadwing/openalerts-langchain— LangChain adapter -
@steadwing/openalerts-mastra— Mastra adapter
Each new plugin will be a thin adapter in this same repo, translating framework-specific events into the shared OpenAlerts event schema. The core engine, rules, and dashboard are reused across all adapters.
MIT
{ "plugins": { "entries": { "openalerts": { "enabled": true, "config": { "alertChannel": "telegram", // telegram | discord | slack | whatsapp | signal "alertTo": "YOUR_CHAT_ID", // your user/chat ID on that channel }, }, }, }, }