Skip to content

Persist and expose multi-battle station state#353

Open
Mygod wants to merge 11 commits intoUnownHash:mainfrom
Mygod:multibattle
Open

Persist and expose multi-battle station state#353
Mygod wants to merge 11 commits intoUnownHash:mainfrom
Mygod:multibattle

Conversation

@Mygod
Copy link
Copy Markdown
Contributor

@Mygod Mygod commented Mar 31, 2026

Why

Max Battles can temporarily override a station's normal local battle. Golbat previously stored only one flat battle_* payload on station, so a temporary event battle would overwrite the local one and the original state would be lost for planning and downstream filtering.

This PR adds first-class multi-battle support for stations while keeping the existing flat station/webhook fields for compatibility.

Ground rule

Latest observed game data is treated as ground truth.

For a given station:

  • newly observed battles are persisted independently
  • if a new battle has a later or equal battle_end than another known battle on the same station, the older/equal-ending row is treated as obsolete and removed
  • a shorter battle does not remove a longer known battle

This keeps temporary overrides and later event rotations aligned with the most recent game state instead of waiting for old rows to expire naturally.

What this PR adds

  • a new station_battle table via migration 54_station_battle.up.sql
  • station battle persistence separate from the parent station row
  • a compatibility projection that still fills station.battle_*
  • a battles array in station API and max_battle webhook payloads
  • multi-battle-aware station filtering instead of single-battle-only filtering
  • hydration of station_battle rows in both preload and lazy station reads
  • expired station_battle cleanup support
  • canonical-battle-based metric dedup

Canonical battle behavior

A station can now know about multiple non-expired battles, but one battle is still chosen as the canonical representative for:

  • flat station.battle_*
  • top-level max_battle webhook fields
  • battle metric dedup

Selection rule:

  • prefer an active battle over an upcoming one
  • if none are active, use the best upcoming battle
  • later battle_end wins among surviving rows

The full battles array remains available for planning use cases.

Identity / persistence

bread_battle_seed is used as the persisted battle identity.

Live validation before implementation showed:

  • no observed seed reuse across multiple stations
  • seeds changed when battles rotated

That made seed the best available battle-instance identifier for persistence.

API / webhook impact

Backward compatibility is preserved:

  • existing top-level battle_* station fields remain
  • existing top-level max_battle webhook fields remain

New additive fields:

  • station API: battles
  • max_battle webhook: battles

This allows downstream consumers to keep working unchanged while giving planning-oriented consumers the full set.

Deployment notes

  • migration 54_station_battle.up.sql is included and runs automatically on startup
  • no manual backfill from legacy station.battle_* into station_battle is included
  • population of station_battle depends on normal rescans after deploy
  • optional cleanup support is available via cleanup.station_battles

Practical rollout expectation:

  • after deploy, rescans repopulate the new table quickly
  • during the current non-overlap period, most stations will still have exactly one battle row

Validation

Tested with:

  • go test ./decoder -run 'StationBattle|StationWebhooks|BuildStationResult|FortLookup|SyncStation|Preload'
  • go test ./... -run '^$'

Post-deploy live DB checks on one environment showed:

  • migration applied cleanly at version 54
  • no active legacy-only stations
  • no active projection mismatches between station and station_battle
  • no active station_battle rows missing boss data
  • currently one active battle row per station, which matches the present daily-rotation state

Notes

This PR is intended to preserve battles for planning, especially around Monday / Max Battle event overlaps, while keeping existing API and webhook consumers stable.

Fixes #352.

@Mygod Mygod marked this pull request as draft April 1, 2026 00:00
@jfberry
Copy link
Copy Markdown
Collaborator

jfberry commented Apr 1, 2026

I'll leave others to comment on whether this is desirable functionality. I don't really care much one way or the other. I do have a few comments on the way its implemented as this doesn't follow commonly held patterns within the code base, or even guidelines in claude.md

An automated review highlights some quite significant variances and performance observations - I haven't bothered with a human review until others chime in about whether the capability is one we want to have or not.

Database Write Impact — Significant Change

Before: Station updates go through the write-behind queue (batched, coalesced, deferred).

After: Every station GMO update now also does a synchronous transactional write outside the write-behind queue:

BEGIN
INSERT INTO station_battle ... ON DUPLICATE KEY UPDATE (upsert)
DELETE FROM station_battle WHERE station_id=? AND seed<>? AND battle_end<=? (prune)
COMMIT

That's 4 DB round-trips per station per GMO, executed synchronously while holding the station lock. This blocks other goroutines waiting
on the same station for the duration of the DB transaction. During daily rotation windows when many stations update simultaneously, this
could become a bottleneck.

The write goes to GeneralDb (shared connection pool), so it competes with all other non-pokemon writes.

Performance Concerns

  1. Synchronous DB in hot path: syncStationBattlesFromProto is called from UpdateStationBatch (the GMO decode loop) while the station lock
    is held. Every other entity in this codebase defers DB writes through the write-behind queue — station battles are the exception.
  2. Repeated getKnownStationBattles calls: A single station save triggers this function ~4-5 times (snapshot, sync, rtree update, webhook,
    API build). Each call does a cache load + clone + sort. Not expensive per-call but multiplied across all stations per GMO.
  3. Side-effecting reads: getKnownStationBattles prunes expired battles from the cache as a side effect of reading. Other code paths in
    this codebase don't mutate cache state on read.
  4. String-based change detection: stationBattleSignatureFromSlice builds a signature string via fmt.Sprintf per battle. This is computed
    multiple times per update cycle. Minor, but a hash or struct comparison would be cheaper.

Pattern Adherence

Follows patterns well:

  • xsync.MapOf for concurrent cache (matches fortLookupCache)
  • Phased preload with correct ordering (battles after stations, parallel with incidents)
  • Cleanup uses the same ticker pattern as incidents/pokemon/tappables
  • Backward-compatible API additions (flat fields remain, new battles array is additive)
  • Deadlock retry with exponential backoff (matches write-behind queue pattern)
  • Test mocking via function variable (upsertStationBattleRecordFunc)

Deviates from patterns:

  1. No write-behind queue — every other persisted entity uses TypedQueue[K, T] for batched, coalesced, rate-limited DB writes. Station
    battles do direct synchronous transactional writes. This is the most significant deviation.
  2. No TTL-based cache — uses plain xsync.MapOf instead of ttlcache.Cache or ShardedCache. Expiry is handled by read-time pruning + hourly
    cleanup. If a station is never re-read, its expired battles stay in memory until the cleanup routine runs.
  3. Ad-hoc control flow flags — forceSave and skipWebhook on Station are new patterns. Other entities use dirty tracking and old/new value
    comparison exclusively. skipWebhook in particular is a hidden side-channel that short-circuits the webhook path.
  4. Transaction-based writes — storeStationBattleRecord is the only entity using explicit BEGIN/COMMIT. Other entities use single INSERT
    ... ON DUPLICATE KEY UPDATE statements.
  5. hydrateStationBattlesForStation is called without station lock in GetStationRecordReadOnly (before GetOrSetFunc). If another goroutine
    is processing a GMO for the same station concurrently, both could race on stationBattleCache.Store. xsync.MapOf won't crash, but one
    write could overwrite the other.

@jfberry
Copy link
Copy Markdown
Collaborator

jfberry commented Apr 1, 2026

Looking at the CLAUDE.md instructions, let me check which PR patterns violate the documented rules.

Code Review — Pattern Compliance with Project Architecture

The project's CLAUDE.md documents specific architectural patterns that all entities are expected to follow. This PR
introduces StationBattleData as a new persisted entity but deviates from several of these documented patterns.

1. Write-behind queue for persistence

Each entity type has a TypedQueue[K, T] that batches and coalesces writes
— CLAUDE.md, "Write-Behind Queues" section

Every other persisted entity (pokemon, gym, pokestop, station, spawnpoint) uses the write-behind queue for DB persistence.
station_battle bypasses this entirely with direct synchronous transactional writes (BEGIN / INSERT ... ON DUPLICATE KEY UPDATE /
DELETE / COMMIT) executed in the GMO decode hot path while holding the station lock.

This has practical consequences: 4 DB round-trips per station per GMO, no coalescing of repeated updates, and the station lock is held
for the full duration of the DB transaction.

2. Entity struct pattern

type Pokestop struct {
    mu TrackedMutex[string] `db:"-"`  // Entity-level mutex
    PokestopData                       // Embedded — copied for queue snapshots
    dirty     bool     `db:"-"`        // Needs DB write
    newRecord bool     `db:"-"`        // INSERT vs UPDATE
    oldValues PokestopOldValues `db:"-"` // Snapshot for webhook comparison
}

— CLAUDE.md, "Struct Pattern" section

StationBattleData is a plain struct with none of these: no entity-level mutex, no embedded data struct for queue snapshots, no dirty
flag, no newRecord flag, no oldValues snapshot. This is the foundational pattern for all entities in the codebase.

3. Record access pattern

Four access patterns, from lightest to heaviest:

  1. Peek*Record — Cache-only lookup
  2. get*RecordReadOnly — Cache lookup with DB fallback
  3. get*RecordForUpdate — ReadOnly + snapshot
  4. getOrCreate*Record — Atomic create-if-absent

The caller MUST call the returned unlock function.
— CLAUDE.md, "Record Access Patterns" section

Station battles have no record access functions. The stationBattleCache is read and written directly (via
xsync.MapOf.Load/Store/Delete) without acquiring an entity lock. Notably, hydrateStationBattlesForStation in
GetStationRecordReadOnly mutates stationBattleCache before the station lock is acquired, which could race with
syncStationBattlesFromProto on another goroutine.

4. Setter methods with dirty tracking

Setter methods (SetName, SetLat, etc.) track dirty state and optionally log field changes when dbDebugEnabled is true.
— CLAUDE.md, "Struct Pattern" section

StationBattleData fields are assigned directly with no setters and no dirty tracking. To work around this, the PR adds a forceSave
flag on Station that bypasses the dirty check in saveStationRecord. A skipWebhook flag is also added as a side-channel to suppress
webhooks on DB failure. Neither pattern exists on any other entity.

@Mygod Mygod marked this pull request as ready for review April 6, 2026 05:42
@Mygod
Copy link
Copy Markdown
Contributor Author

Mygod commented Apr 6, 2026

ReactMap PR is done. From my testing, everything seems ok. Feel free to complain now. :)

I'm also planning to do another review pass in two days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-battle stations

2 participants