Skip to content

feat: structured events + log rotation/retention#25

Merged
arcaven merged 5 commits intomainfrom
feat/events-and-log-mgmt
Apr 19, 2026
Merged

feat: structured events + log rotation/retention#25
arcaven merged 5 commits intomainfrom
feat/events-and-log-mgmt

Conversation

@arcaven
Copy link
Copy Markdown
Collaborator

@arcaven arcaven commented Apr 19, 2026

Summary

Two observability features landed together because they share the same motivation — operators need bounded, queryable signal on low-quota hosts like the desk Pi.

  • k0t / events — structured state-transition ring. Complements marvel daemon logs (raw stderr stream) with queryable, severity-tagged event history. Marvel's equivalent of `kubectl get events`.
  • Log rotation--log-file now rotates by size, gzip-compresses archives, enforces count + total-bytes retention, and has a stub `Shipper` hook so remote-offload can land later without touching the writer.

Closes ticket 0og (audit completed separately — no code changes needed, both optional-valued flags already had `NoOptDefVal` set).

New CLI surface

`marvel events`

Top-level command, remote-capable via mrvl://.

```sh
marvel events # last 100 events
marvel events -n 500
marvel events --session demo/shell-g1-0 # filter by session key
marvel events --workspace demo
marvel events --kind session.crashed
marvel events --warnings # severity filter
marvel --cluster desk events # remote via mrvl://
```

Emitted event kinds: `session.created`, `session.deleted`, `session.crashed`, `session.restarted`, `health.failed`, `health.crashloop-backoff`, `role.saturated`, `team.shift-started`, `team.shift-completed`.

`marvel daemon --log-max-*` flags

New on `marvel daemon`:

  • `--log-max-size 10` — rotate at 10 MiB (default)
  • `--log-max-files 5` — keep 5 gzipped archives (default)
  • `--log-max-total 0` — total disk-usage cap in MiB (0 disables; off by default so existing setups keep working unchanged)

Zero in any slot disables that specific limit. Archives land alongside the active file as `daemon.log.20260418T183045Z.gz` — lexicographic sort == chronological, oldest-first retention.

Design notes

  • `internal/events` — bounded ring, filterable snapshot, nil-safe `Emit` helper so producers that don't have a ring injected still run. Default capacity 2000.
  • `internal/rlog` — `io.WriteCloser` that owns rotation + gzip + retention. `Shipper` interface with `NoopShipper` default — real transports (scp, s3, rsync) are a separate probe.
  • Emission sites: `session.Manager.Create/Delete/ReapDead` and `team.Controller.restartSession/evaluateHealth/InitiateShift/reconcileShift`. Saturation + CrashLoopBackOff events emit on the tick they're first detected, not every tick, so the ring doesn't fill up with repeated transitions.

Test plan

  • `internal/events` unit tests — ring append/overflow, filters, nil-safe emit, concurrent writes (-race)
  • `internal/rlog` unit tests — size rotation, count retention, total-bytes retention, shipper invoked on rotation, empty-file no-op, append preserves across reopens
  • Full marvel suite green (`go test ./...` + `-race`)
  • `golangci-lint run` clean (0 issues)
  • Manual: start `marvel daemon`, work a manifest, run `marvel events` — to be done on the desk alpha once built

Related

  • Defers aae-orc-4wz (RRD-style dedup) — complementary idea; the dedup layer would sit in `internal/logbuf` not `events`.
  • Defers aae-orc-1d2 (ssh poll noise); dedup is the right hammer for that, not events.

Refs: aae-orc-k0t, Skippy session-025/026 raspi feedback

arcaven added 5 commits April 18, 2026 19:56
New internal/rlog package — io.WriteCloser that rotates the active log
file by size, gzips the rolled file, enforces a count + total-bytes
retention ceiling (oldest archives deleted first), and exposes a stub
Shipper hook so operators can wire up remote offload later without
touching the writer.

Motivated by desk Pi disk-quota headroom: the daemon's --log-file tee
(~/.marvel/log/daemon.log) currently grows unbounded. A single
misbehaving reconciler can fill the filesystem overnight, especially
while the RRD-style dedup idea (aae-orc-4wz) is still just an idea.

Shipper stays as an interface with a NoopShipper impl; the real
transports (scp, s3, rsync, http POST) are a separate probe when the
offload-to-central-logger direction gets prioritized.

Refs: aae-orc-k0t, Skippy session-025/026 raspi feedback.
Replaces the unbounded os.OpenFile append with rlog.Writer. Defaults
set raspi-friendly caps:

- --log-max-size 10  (MiB; rotate at 10 MiB)
- --log-max-files 5  (keep 5 gzipped archives)
- --log-max-total 0  (total-bytes cap disabled by default)

Zero in any slot disables that specific limit. Backward-compatible for
the common path — `marvel daemon --log-file X` now rotates into
`X.<timestamp>.gz` archives instead of growing forever.

Refs: aae-orc-k0t, Skippy session-025/026 raspi feedback.
New internal/events package — structured events (SessionCreated,
SessionCrashed, SessionRestarted, HealthCheckFailed, ShiftStarted,
etc.) stored in a bounded ring, filterable by workspace/team/role/
session/kind/severity.

Complements internal/logbuf (raw daemon stderr stream) with queryable,
structured history — marvel's equivalent of 'kubectl get events'.

Producers (session.Manager, team.Controller) will be wired in the
next commit. Emit() is nil-safe so a producer instantiated without
a ring still runs without panic.

Refs: aae-orc-k0t
Wire events.Emitter through both producers. Nil-safe (events.Emit
no-ops on a nil Emitter) so tests that don't inject a ring stay
quiet and existing callers that haven't been updated still compile.

Emission sites:
- Manager.Create  → session.created
- Manager.Delete  → session.deleted
- Manager.ReapDead → session.crashed (warning)
- Controller.restartSession:
    - role.saturated (warning) when MaxRestarts reached
    - health.crashloop-backoff (warning) on first backoff tick
    - session.restarted (warning) on actual restart
- Controller.evaluateHealth → health.failed (warning) on first
  failure-threshold breach
- Controller.InitiateShift → team.shift-started
- Controller.reconcileShift completion → team.shift-completed

Refs: aae-orc-k0t
Daemon owns an events.Ring (default 2000-event capacity), wires it to
session.Manager and team.Controller at construction. New 'events' RPC
method returns a filtered snapshot; new top-level 'marvel events'
command prints a tabulated view with filters for workspace, team,
role, session key, kind, and warnings-only.

Usage:
  marvel events                              # last 100 events
  marvel events -n 500
  marvel events --session demo/shell-g1-0
  marvel events --kind session.crashed
  marvel events --warnings
  marvel --cluster desk events               # remote via mrvl://

Complements 'marvel daemon logs' (raw stderr stream) with structured,
queryable state-transition history — marvel's 'kubectl get events'.

Refs: aae-orc-k0t
@arcaven arcaven force-pushed the feat/events-and-log-mgmt branch from 08ec6b3 to a181065 Compare April 19, 2026 00:59
@arcaven arcaven merged commit cfbb555 into main Apr 19, 2026
7 checks passed
@arcaven arcaven added type.feature Net-new capability agent.worker PR created by a Claude Code worker agent area.events Structured event ring area.logs logbuf labels Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent.worker PR created by a Claude Code worker agent area.events Structured event ring area.logs logbuf type.feature Net-new capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant