Skip to content

Automated runtime monitoring alerts and safeguards #20

@wat-hiroaki

Description

@wat-hiroaki

Summary

Currently, Studio provides visual runtime monitoring (Activity Map, Activity Stream, Cockpit overlay), but lacks automated detection and alerting for potential issues during long-running agent sessions.

Current State

✅ Already implemented:

  • Real-time status visualization (thinking, tool_running, awaiting, error, idle)
  • Per-agent memory usage tracking with color-coded warnings (1GB/2GB thresholds)
  • Activity Stream logging all events across sessions
  • Cockpit overlay with manual RESTART/STOP controls

❌ Not yet implemented:

  • Automated alerts when agents stay in error state too long
  • Detection of infinite loops or repeated failed operations
  • Dangerous operation warnings (destructive commands, unexpected file deletions)
  • Cost/token usage tracking per session
  • Configurable thresholds and notification rules

Motivation

When managing multiple agents in long sessions, it's easy to miss one agent silently failing or looping. Automated alerts would catch these issues before they waste time or cause damage.

Raised via Reddit feedback from u/draconisx4.

Proposed Features (prioritized)

  1. Error state timeout alert — notify when agent is in error for > N minutes
  2. Loop detection — flag repeated identical tool calls or error patterns
  3. Dangerous operation warning — intercept rm -rf, force push, DROP TABLE etc.
  4. Session cost tracking — token/API cost estimation per agent
  5. Custom alert rules — user-configurable thresholds and actions

Open Questions

  • Should alerts be in-app notifications, system notifications, or both?
  • Should dangerous operation detection block the action or just warn?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions