Skip to content

feat: always-on TrackedMutex for lock contention instrumentation#344

Merged
jfberry merged 4 commits intomainfrom
feat/lock-contention-instrumentation
Mar 21, 2026
Merged

feat: always-on TrackedMutex for lock contention instrumentation#344
jfberry merged 4 commits intomainfrom
feat/lock-contention-instrumentation

Conversation

@jfberry
Copy link
Copy Markdown
Collaborator

@jfberry jfberry commented Mar 18, 2026

Summary

  • Introduces TrackedMutex (decoder/tracked_mutex.go) wrapping sync.Mutex with TryLock-based contention detection and caller tracking
  • Replaces sync.Mutex with TrackedMutex on all 9 entity structs (Pokestop, Gym, Incident, Pokemon, Station, Weather, Spawnpoint, Route, Tappable)
  • Adds caller string parameter to all 31 get*Record* / Peek* functions, threaded through to Lock(caller)
  • Updates ~65 call sites across 22 files with descriptive caller strings

How it works

  • Fast path (~25ns): TryLock() succeeds, stores caller + timestamp — zero log overhead
  • Contention path: logs [LOCK_CONTENTION] with entity type/id, waiter, holder, and hold duration, then blocks. Logs [LOCK_ACQUIRED] once acquired with wait time
  • Long hold: logs [LOCK_HELD_LONG] on Unlock if held >5s

Example output

WARN [LOCK_CONTENTION] Pokestop id=abc123 waiter=saveIncidentRecord holder=clearPokestopWithLock held_for=2.3s
WARN [LOCK_ACQUIRED] Pokestop id=abc123 caller=saveIncidentRecord waited=4.1s (holder was clearPokestopWithLock)

Test plan

  • go build ./... compiles cleanly
  • go vet ./... passes
  • go test ./... all tests pass
  • Deploy to staging and observe logs under load — verify [LOCK_CONTENTION] appears only during actual contention, silence otherwise

🤖 Generated with Claude Code

@jfberry jfberry force-pushed the feat/lock-contention-instrumentation branch from 3ff3b37 to 3e07841 Compare March 18, 2026 15:42
jfberry and others added 4 commits March 21, 2026 17:58
Replace sync.Mutex with TrackedMutex on all entity structs to detect
deadlocks and lock contention at runtime. Fast path uses TryLock with
~25ns overhead. On contention, logs holder/waiter identity and wait
duration. Warns if any lock is held longer than 5 seconds.

All get*Record* functions now accept a caller string parameter that
propagates through to Lock(), enabling precise identification of which
code path holds or is waiting for a lock in contention logs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… IDs

- TrackedMutex[K] is now generic over the entity ID type; string
  formatting (%v) only runs on the cold contention/warning paths
- holder and acquiredAt use atomic.Value / atomic.Int64 so reads from
  the contention path are race-detector safe
- Replace Init+sync.Once with direct parameters on Lock/Unlock for
  simplicity and smaller struct size
- Consolidate repeated time.Now() calls into local variables
- Skip incidents with blank IncidentId in UpdateFortBatch to avoid
  cache key collisions and spurious contention

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 1-2-4-8-16-32-64-128ms backoff loop (~255ms total) before emitting
LOCK_CONTENTION warnings. Most transient contention resolves silently;
only locks held longer than 255ms produce log output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jfberry jfberry force-pushed the feat/lock-contention-instrumentation branch from 618897d to dc4e92e Compare March 21, 2026 17:58
@jfberry jfberry merged commit b4c7a00 into main Mar 21, 2026
1 check passed
@Fabio1988 Fabio1988 deleted the feat/lock-contention-instrumentation branch March 21, 2026 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants