Skip to content

Step-level observability: cost tracking, metrics, and event-driven triggers #52

@ameet

Description

@ameet

Three observability and execution model improvements

1. Cost / token tracking per step

Problem: Flows that call LLMs (via claude --print or API actions) have zero visibility into token spend. In a 90+ flow codebase with 25+ LLM-calling flows, an orchestrator flow might invoke 8 sub-flows containing ~40 LLM calls total. When a parallel branch fails at minute 15, the tokens burned on that branch are unquantifiable.

We've built defensive "gate" steps (e.g., check that research output is non-empty before running expensive analysis) specifically to prevent token waste — but this is guesswork without actual cost data.

Proposed solution:

{
  "id": "runAnalysis",
  "type": "bash",
  "bash": { "command": "claude --print ..." },
  "metrics": {
    "track": true,
    "costLabel": "company-analysis"
  }
}

After flow execution, output includes:

{
  "_metrics": {
    "totalDurationMs": 145000,
    "steps": {
      "runAnalysis": {
        "durationMs": 12000,
        "inputTokens": 8234,
        "outputTokens": 3421,
        "estimatedCost": "$0.037"
      }
    },
    "totalEstimatedCost": "$1.24"
  }
}

For non-LLM steps, duration alone is valuable. For LLM steps, token counts enable optimization (e.g., "this prompt is 80K tokens — should we truncate?").

Impact: Enables data-driven pipeline optimization. Currently every LLM-heavy workflow is a cost black box.

2. Step duration metrics (lightweight version of above)

Even without token tracking, knowing per-step wall-clock time would be valuable. A --metrics flag on one flow execute that appends timing data to each step output:

one --agent flow execute my-flow --metrics -i param=value

Output includes _stepTimings: { "step1": 1234, "step2": 5678, ... }.

This is implementable without any LLM-specific logic and would help identify bottlenecks in any flow.

3. Event-driven flow triggers

Problem: The relay system handles inbound webhooks, but there's no declarative way to say "when event X occurs, run flow Y." Currently we:

  • Poll via scheduled bash scripts (inbox-poll flow)
  • Chain flows manually via bash one flow execute inside another flow
  • Use relay + passthrough for simple webhook→action forwarding

A native trigger system would enable reactive pipelines:

{
  "key": "auto-process-deal",
  "trigger": {
    "type": "webhook",
    "path": "/deal-created",
    "filter": "$.body.source === 'email'"
  },
  "flow": "deal-process",
  "inputMapping": {
    "companyName": "$.body.company",
    "founderEmail": "$.body.email"
  }
}

Or schedule-based:

{
  "trigger": {
    "type": "schedule",
    "cron": "0 9 * * 1-5"
  },
  "flow": "daily-digest",
  "inputMapping": {
    "date": "$.trigger.scheduledDate"
  }
}

Impact: Eliminates polling flows and manual chaining. Enables reactive architectures where flows respond to events rather than being manually invoked.

Relationship to existing issues

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions