What
marvel get sessions has HEALTH and CTX% columns. For real agent sessions (forestage, bare claude, generic runtimes), both columns are permanently empty:
WORKSPACE TEAM ROLE GEN NAME STATE HEALTH CTX% DESK AGENT
dev solo worker 1 solo-worker-g1-0 running unknown - 1 forestage
The values stay at unknown / - regardless of how long the agent runs, what it's doing, or how close it is to context exhaustion.
Observed
Live test on alpha-20260419.034700.20d90ac: launched a forestage role under marvel, injected a prompt (marvel inject ...), watched Claude process it end-to-end. The agent was actively working — "Channeling… (thinking)" visible in the captured pane — but the marvel get sessions HEALTH/CTX columns never changed.
Today these columns only move in the simulator tests (internal/simulator/engine.go).
What should happen
-
HEALTH: a role with healthcheck.type: heartbeat should see its live agent's state reflected in the column — healthy while the agent is alive and responsive, failing after N missed heartbeats, unknown only before the first heartbeat arrives. The existing FailureCount / LastHeartbeat fields on Session are already the right shape; they're just never written by a real agent.
-
CTX%: for LLM-backed runtimes (forestage, claude, and anything marvel manages whose runtime is a Claude Code subprocess), the session row should show the agent's current context window utilization as a percentage. The existing ContextPercent field is already displayed; the agent-side reporting is the missing half.
Why
- The columns imply information that isn't there. An operator scanning
marvel get sessions sees HEALTH unknown and reasonably assumes "the health check hasn't fired yet" — when actually the signal was never going to arrive.
- The
heartbeat healthcheck type is designed to let marvel detect and restart hung agents. Without live agents writing heartbeats, the check's failure_threshold cannot fire, so marvel can't distinguish "working" from "stuck" for real agents.
- CTX% is load-bearing for the
shift feature (rolling rotation of exhausted-context agents). Without real ctx data, marvel can't make shift decisions that match operator intent.
Scope (what, not how)
- Applies to at least the forestage and claude adapters in
internal/runtime/. Whether the generic adapter should participate is a judgment call — probably opt-in per role.
- Both ends of the wire need a design: what the agent emits and how marvel ingests it. Both are needed; shipping one without the other leaves the columns still empty.
- The initial cut can be coarse — a heartbeat that just says "I'm alive" and a context-percent ping every N seconds — and get refined later.
Not scope
- Full OTEL instrumentation. OTEL is the right long-term answer but the daemon's Session struct and
marvel get sessions output already expect these two specific values; filling them is narrower than a telemetry overhaul.
- Agent-internal metrics (tool call counts, token rates, etc.). Those belong to a later observability surface.
Related
internal/api/types.go: Session.HealthState, LastHeartbeat, FailureCount, ContextPercent already defined
internal/team/controller.go:TestHealthEvalHeartbeatStale etc.: controller already evaluates heartbeat staleness — consumes the data it just doesn't receive
internal/simulator/engine.go: reference implementation of what a compliant agent's state transitions look like
Environment
- marvel 0.1.0-alpha.20260419.034700.20d90ac (commit 20d90ac)
- forestage alpha-20260418-050527-3a316b0
- Linux aarch64 Pi
What
marvel get sessionshas HEALTH and CTX% columns. For real agent sessions (forestage, bareclaude, generic runtimes), both columns are permanently empty:The values stay at
unknown/-regardless of how long the agent runs, what it's doing, or how close it is to context exhaustion.Observed
Live test on
alpha-20260419.034700.20d90ac: launched a forestage role under marvel, injected a prompt (marvel inject ...), watched Claude process it end-to-end. The agent was actively working — "Channeling… (thinking)" visible in the captured pane — but themarvel get sessionsHEALTH/CTX columns never changed.Today these columns only move in the simulator tests (
internal/simulator/engine.go).What should happen
HEALTH: a role with
healthcheck.type: heartbeatshould see its live agent's state reflected in the column —healthywhile the agent is alive and responsive,failingafter N missed heartbeats,unknownonly before the first heartbeat arrives. The existingFailureCount/LastHeartbeatfields on Session are already the right shape; they're just never written by a real agent.CTX%: for LLM-backed runtimes (forestage, claude, and anything marvel manages whose runtime is a Claude Code subprocess), the session row should show the agent's current context window utilization as a percentage. The existing
ContextPercentfield is already displayed; the agent-side reporting is the missing half.Why
marvel get sessionsseesHEALTH unknownand reasonably assumes "the health check hasn't fired yet" — when actually the signal was never going to arrive.heartbeathealthcheck type is designed to let marvel detect and restart hung agents. Without live agents writing heartbeats, the check'sfailure_thresholdcannot fire, so marvel can't distinguish "working" from "stuck" for real agents.shiftfeature (rolling rotation of exhausted-context agents). Without real ctx data, marvel can't make shift decisions that match operator intent.Scope (what, not how)
internal/runtime/. Whether the generic adapter should participate is a judgment call — probably opt-in per role.Not scope
marvel get sessionsoutput already expect these two specific values; filling them is narrower than a telemetry overhaul.Related
internal/api/types.go:Session.HealthState,LastHeartbeat,FailureCount,ContextPercentalready definedinternal/team/controller.go:TestHealthEvalHeartbeatStaleetc.: controller already evaluates heartbeat staleness — consumes the data it just doesn't receiveinternal/simulator/engine.go: reference implementation of what a compliant agent's state transitions look likeEnvironment