Skip to content

logbuf dedup ineffective: timestamp prefix included in comparison key #32

@arcavenai

Description

@arcavenai

Summary

PR #26's internal/logbuf Cisco-style dedup works in principle ("last message repeated N times") but collapses far less traffic than intended because the dedup comparison key includes Go's log.Printf timestamp prefix — and that timestamp advances each second.

Measurement

Live daemon with dev polling every ~1s (generating ssh: client connected: michael.pursifull (SHA256:...) per connect):

Disk log (--log-file, direct writes, no logbuf interposed):

$ grep -c 'last message repeated' ~/.marvel/log/daemon.log
0
$ grep -c 'ssh: client connected' ~/.marvel/log/daemon.log
1883

In-memory ring (marvel daemon logs, served through logbuf):

$ marvel daemon logs | wc -l
100
$ marvel daemon logs | grep -c 'last message repeated'
48
$ marvel daemon logs | grep -c 'ssh: client connected'
52

So the ring compresses about 2:1 in this scenario — only because some seconds receive two connects (which produce byte-identical lines). Connects in different seconds don't match because the YYYY/MM/DD HH:MM:SS prefix differs.

Sample from ring showing the pattern:

2026/04/18 22:38:39 ssh: client connected: michael.pursifull (SHA256:...)
last message repeated 2 times                                  ← these 2 were same-second
2026/04/18 22:38:41 ssh: client connected: michael.pursifull (SHA256:...)
last message repeated 2 times
2026/04/18 22:38:43 ssh: client connected: michael.pursifull (SHA256:...)
last message repeated 2 times

A 100% compressed result would have been one ssh: client connected line followed by last message repeated 1882 times so far.

Root cause

logbuf compares full line bytes. The timestamp prefix is injected by log.Printf upstream of logbuf, so every new second produces a distinct input regardless of payload identity.

Fix direction

Dedup on a content fingerprint that strips the leading YYYY/MM/DD HH:MM:SS (or, more defensibly, whatever log.Flags() is configured to emit). Store the canonical (stripped) form as the comparison key while preserving the original line for display, or re-emit the summary with the latest observed timestamp.

Alternatively, move logbuf to receive post-log-prefix input and attach the timestamp on egress.

Related

  • Dev (@arcaven) identified this during PR feat(ssh): include client IP in connection log #31 work; also tracked as bd aae-orc-407l locally.
  • Separately worth considering: the disk log (via --log-file) has no dedup at all. Logbuf only sits in front of the ring buffer that marvel daemon logs reads. Whether that's by design ("disk = raw archive") or a gap is a design call.

Environment

  • marvel 0.1.0-alpha.20260419.014522.b8f1f4b (commit b8f1f4b)
  • Linux aarch64, mrvl://, daemon started with --log-max-size 1 --log-max-files 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    area.logslogbufpriority.p2Medium — should address this sprintscope.in-scopeAccepted, will work ontriage.completeTriage done, ready for worktype.bugBroken behavior — something doesn't work as designed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions