refactor: modernize balance sync worker - remove legacy code, batch UPDATE, structured logging, MT terminology#2034
Conversation
… Redis consumer 🧪
…mproved data handling and atomic operations ✨
…s, and void handleBusyMode 🔨
…document flush pattern 🔨
…ggregation and sync pipeline 🔨
…name to convention 🧪
WalkthroughRefactors ledger balance sync from a max-concurrency worker to a dual-trigger batching collector with distributed Redis locking. Replaces per-balance Changes
Sequence Diagram(s)sequenceDiagram
participant Worker as BalanceSyncWorker
participant Collector as BalanceSyncCollector
participant Redis as Redis ZSET
participant Lua as claim_balance_sync_keys.lua
participant Aggregator as SyncAggregator
participant DB as PostgreSQL
Worker->>Collector: Run(ctx, flushFn, fetchKeys, waitForNext)
activate Collector
loop running
Collector->>Redis: EVALSHA claim_balance_sync_keys.lua (limit)
Redis->>Lua: execute script
Lua-->>Redis: claimed members (member, score) with lock TTL
Redis-->>Collector: []SyncKey
Collector->>Collector: buffer keys
alt buffer size ≥ batchSize or timer elapsed
Collector->>Aggregator: Aggregate(buffered keys)
Aggregator-->>Collector: []*AggregatedBalance
Collector->>Worker: flushFn(grouped by orgID/ledgerID)
Worker->>DB: UpdateMany(orgID, ledgerID, balances...)
DB-->>Worker: RowsAffected
Worker->>Redis: RemoveBalanceSyncKeysBatch(claimed keys)
Redis-->>Worker: removed count
Worker-->>Collector: flush result
Collector->>Collector: reset buffer/timer
else
Collector->>Collector: wait (poll interval) or continue
end
end
deactivate Collector
Comment |
|
This PR is very large (35 files, 6683 lines changed). Consider breaking it into smaller PRs for easier review. |
🔒 Security Scan Results —
|
| Policy | Status |
|---|---|
| Default non-root user | ✅ Passed |
| No fixable critical/high CVEs | ✅ Passed |
| No high-profile vulnerabilities | ✅ Passed |
| No AGPL v3 licenses | ✅ Passed |
📊 Unit Test Coverage Report:
|
| Metric | Value |
|---|---|
| Overall Coverage | 86.9% ✅ PASS |
| Threshold | 85% |
Coverage by Package
| Package | Coverage |
|---|---|
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/http/in |
86.5% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/mongodb/onboarding |
66.7% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/mongodb/transaction |
66.7% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/postgres/account |
100.0% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/postgres/accounttype |
66.7% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/postgres/asset |
100.0% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/postgres/assetrate |
100.0% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/postgres/balance |
100.0% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/postgres/ledger |
100.0% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/postgres/operation |
90.0% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/postgres/operationroute |
100.0% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/postgres/organization |
100.0% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/postgres/portfolio |
100.0% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/postgres/segment |
100.0% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/postgres/transaction |
97.4% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/postgres/transactionroute |
100.0% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/rabbitmq |
91.5% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/adapters/redis/transaction/balance |
100.0% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/services/command |
88.0% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/services/query |
93.0% |
github.com/LerianStudio/midaz/v3/components/ledger/internal/services |
0.0% |
Generated by Go PR Analysis workflow
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
components/ledger/internal/bootstrap/redis_consumer.worker_mt_fuzz_test.go (1)
117-133: 🧹 Nitpick | 🔵 TrivialThis worker fuzz branch never exercises
isMTReady() == true.
tenantCacheis never set here, soexpectWorkerReadyis effectively hard-wired tofalseand the positive predicate path cannot regress under fuzz coverage. Add ahasTenantCachedimension, or seed at least one non-nil cache case. The consumer half below has the same blind spot.Also applies to: 138-139
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/ledger/internal/bootstrap/redis_consumer.worker_mt_fuzz_test.go` around lines 117 - 133, The fuzz test never sets worker.tenantCache so isMTReady() can never be true; update the fuzz inputs to include a hasTenantCache boolean (or seed a non-nil cache case) and when true assign a valid tenant cache to worker.tenantCache before calling isMTReady(); do the same change for the consumer branch later in the file to ensure both positive predicate paths for NewBalanceSyncWorker.isMTReady() are exercised (reference NewBalanceSyncWorker, worker.mtEnabled, worker.pgManager, worker.tenantCache, and isMTReady()).components/ledger/internal/bootstrap/config.go (1)
837-856:⚠️ Potential issue | 🟡 MinorLog the effective sync config, not the raw env values.
NewBalanceSyncWorker/NewBalanceSyncWorkerMTclamp<= 0values to safe defaults, but this log still emits the pre-normalizedsyncCfg. With an invalid env value you'll report one configuration and run another, which makes startup diagnostics misleading.📝 Proposed fix
func initBalanceSyncWorker(opts *Options, cfg *Config, logger libLog.Logger, commandUC *command.UseCase, pgManager *tmpostgres.Manager, tenantServiceName string) *BalanceSyncWorker { syncCfg := BalanceSyncConfig{ BatchSize: cfg.BalanceSyncBatchSize, FlushTimeoutMs: cfg.BalanceSyncFlushTimeoutMs, PollIntervalMs: cfg.BalanceSyncPollIntervalMs, } @@ if opts != nil && opts.MultiTenantEnabled && opts.TenantCache != nil { balanceSyncWorker = NewBalanceSyncWorkerMT(logger, commandUC, syncCfg, true, opts.TenantCache, pgManager, tenantServiceName) } else { balanceSyncWorker = NewBalanceSyncWorker(logger, commandUC, syncCfg) } + effectiveCfg := balanceSyncWorker.syncConfig logger.Log(context.Background(), libLog.LevelInfo, "BalanceSyncWorker enabled", - libLog.Int("batch_size", syncCfg.BatchSize), - libLog.Int("flush_timeout_ms", syncCfg.FlushTimeoutMs), - libLog.Int("poll_interval_ms", syncCfg.PollIntervalMs), + libLog.Int("batch_size", effectiveCfg.BatchSize), + libLog.Int("flush_timeout_ms", effectiveCfg.FlushTimeoutMs), + libLog.Int("poll_interval_ms", effectiveCfg.PollIntervalMs), ) return balanceSyncWorker }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/ledger/internal/bootstrap/config.go` around lines 837 - 856, The current startup log prints the pre-normalized syncCfg but NewBalanceSyncWorker/NewBalanceSyncWorkerMT clamp invalid values, so replace the log of syncCfg with the worker's effective config: after creating balanceSyncWorker (via NewBalanceSyncWorker or NewBalanceSyncWorkerMT) obtain the runtime-normalized values from the worker (e.g., add/consume an accessor like BalanceSyncWorker.Config() or getters for BatchSize/FlushTimeoutMs/PollIntervalMs) and log those values instead; remove or update the existing logger.Log call that references the original syncCfg to use the values returned by balanceSyncWorker so startup diagnostics reflect what actually runs.components/ledger/internal/bootstrap/balance_sync.collector.go (1)
108-123: 🧹 Nitpick | 🔵 TrivialKeep the fetch-error backoff logs collector-scoped.
waitOrDoneincomponents/ledger/internal/bootstrap/balance_sync.worker.go:552-560emitsBalanceSyncWorkerlog messages. Reusing it here means collector fetch retries will be mislabeled in production, which undercuts the new structured logging. Either make that helper component-agnostic or keep the wait local to the collector.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/ledger/internal/bootstrap/balance_sync.collector.go` around lines 108 - 123, The collector currently calls waitOrDone(ctx, c.pollInterval, c.logger) which emits BalanceSyncWorker logs, mislabeling collector retry/backoff messages; replace that call with a collector-scoped wait that uses c.logger and the collector identity (BalanceSyncCollector) or make waitOrDone accept an explicit component label. Concretely, in the BalanceSyncCollector code path (method on c) remove the waitOrDone call and implement a local context-aware wait using c.pollInterval (select on ctx.Done() and time.After(c.pollInterval)) so the log and any timeout handling are emitted via c.logger and attributed to BalanceSyncCollector. Ensure you reference the existing c.logger, c.pollInterval, and BalanceSyncCollector context when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/ledger/internal/adapters/postgres/balance/balance.postgresql.go`:
- Around line 1451-1486: The UpdateMany implementation builds a VALUES list with
4 params per balance plus 3 shared params and must guard against PostgreSQL's
65,535 bind-parameter limit; add a constant like maxUpdateManyRows = (65535-3)/4
and check len(deduped) against it before constructing values (in the UpdateMany
function), returning an error if exceeded or alternatively implement chunking by
iterating over deduped in slices of maxUpdateManyRows and executing the same
query per chunk while accumulating affected rows; reference the deduped slice,
the VALUES construction loop (valuesClauses/args/paramIdx), and the
db.ExecContext call when adding the guard or chunking logic.
In
`@components/ledger/internal/bootstrap/balance_sync.worker_integration_test.go`:
- Around line 106-109: The test currently checks removal indirectly via
GetBalanceSyncKeys (which filters locked members), so replace or complement that
check with a direct Redis sorted-set assertion: call the Redis ZSCORE or ZRANK
on the schedule key for the balanceKey used in the test and assert the result is
nil/not found (or returns redis.Nil) after RemoveBalanceSyncKeysBatch() runs;
locate the test in balance_sync.worker_integration_test.go around the loop that
iterates keysAfter and add a direct ZSET query against the same schedule Redis
key to ensure RemoveBalanceSyncKeysBatch actually deleted the member rather than
just leaving a claim lock.
In `@components/ledger/internal/bootstrap/balance_sync.worker_mt_fuzz_test.go`:
- Around line 16-25: The fuzz test function name
FuzzNewBalanceSyncWorkerMT_MaxWorkers is outdated because NewBalanceSyncWorkerMT
no longer takes a MaxWorkers parameter; rename the test to match the current
constructor API (e.g., FuzzNewBalanceSyncWorkerMT) and update the comment text
to remove any reference to MaxWorkers so the corpus and go test -fuzz output are
not misleading; ensure you change the function identifier
(FuzzNewBalanceSyncWorkerMT_MaxWorkers) wherever referenced and keep the body
and verified properties unchanged except for the name/comment edits.
In `@components/ledger/internal/bootstrap/balance_sync.worker_test.go`:
- Around line 209-212: The loop in the test iterating over testCases contains
redundant per-iteration rebindings (tc := tc, orgID := orgID, ledgerID :=
ledgerID); remove these unnecessary assignments and use the loop variables
directly in the body (the for loop over testCases and any uses of tc, orgID,
ledgerID) since Go 1.22+ scopes loop variables per iteration.
In `@components/ledger/internal/bootstrap/balance_sync.worker.go`:
- Around line 416-431: The flush callback is incorrectly reporting "no work"
when keys are cleaned up; update the logic so any successful removal/consumption
of keys counts as progress. Specifically, when iterating groups from
groupKeysByOrgLedger and calling processSyncBatch (or when processSyncBatch
invokes RemoveBalanceSyncKeysBatch), ensure processSyncBatch returns true if it
performed any cleanup/consumption of keys (not only when new balances were
processed), and set processed = true in the caller whenever processSyncBatch
indicates cleanup; alternatively, detect RemoveBalanceSyncKeysBatch success in
the caller and set processed = true so the FlushFunc contract is honored.
- Around line 538-541: The log call in BalanceSyncWorker is using redundant
int(...) conversions on fields that are already int (result.KeysProcessed,
result.BalancesAggregated, result.BalancesSynced); remove the unnecessary casts
in the w.logger.Log invocation (keep libLog.Int("processed",
result.KeysProcessed), libLog.Int("aggregated", result.BalancesAggregated),
libLog.Int("synced", result.BalancesSynced)) so the types are correct and
golangci-lint/unconvert warnings are resolved.
In `@components/ledger/internal/bootstrap/redis_consumer.worker_mt_test.go`:
- Around line 307-360: The test
TestRedisQueueConsumer_RunDispatchesBasedOnMultiTenantReady currently only calls
isMultiTenantReady and doesn't verify that Run() actually dispatches to the
correct branch; update the test to call consumer.Run(...) and make the selected
path observable by injecting a spy or stubbing the consumer's runSingleTenant
and runMultiTenant methods (or wrapping NewRedisQueueConsumer to set flags) so
you can assert which branch was executed; use the unique symbols
NewRedisQueueConsumer, Run, isMultiTenantReady, runSingleTenant and
runMultiTenant to locate and modify the test so it verifies Run() dispatch
rather than just the predicate.
---
Outside diff comments:
In `@components/ledger/internal/bootstrap/balance_sync.collector.go`:
- Around line 108-123: The collector currently calls waitOrDone(ctx,
c.pollInterval, c.logger) which emits BalanceSyncWorker logs, mislabeling
collector retry/backoff messages; replace that call with a collector-scoped wait
that uses c.logger and the collector identity (BalanceSyncCollector) or make
waitOrDone accept an explicit component label. Concretely, in the
BalanceSyncCollector code path (method on c) remove the waitOrDone call and
implement a local context-aware wait using c.pollInterval (select on ctx.Done()
and time.After(c.pollInterval)) so the log and any timeout handling are emitted
via c.logger and attributed to BalanceSyncCollector. Ensure you reference the
existing c.logger, c.pollInterval, and BalanceSyncCollector context when making
the change.
In `@components/ledger/internal/bootstrap/config.go`:
- Around line 837-856: The current startup log prints the pre-normalized syncCfg
but NewBalanceSyncWorker/NewBalanceSyncWorkerMT clamp invalid values, so replace
the log of syncCfg with the worker's effective config: after creating
balanceSyncWorker (via NewBalanceSyncWorker or NewBalanceSyncWorkerMT) obtain
the runtime-normalized values from the worker (e.g., add/consume an accessor
like BalanceSyncWorker.Config() or getters for
BatchSize/FlushTimeoutMs/PollIntervalMs) and log those values instead; remove or
update the existing logger.Log call that references the original syncCfg to use
the values returned by balanceSyncWorker so startup diagnostics reflect what
actually runs.
In `@components/ledger/internal/bootstrap/redis_consumer.worker_mt_fuzz_test.go`:
- Around line 117-133: The fuzz test never sets worker.tenantCache so
isMTReady() can never be true; update the fuzz inputs to include a
hasTenantCache boolean (or seed a non-nil cache case) and when true assign a
valid tenant cache to worker.tenantCache before calling isMTReady(); do the same
change for the consumer branch later in the file to ensure both positive
predicate paths for NewBalanceSyncWorker.isMTReady() are exercised (reference
NewBalanceSyncWorker, worker.mtEnabled, worker.pgManager, worker.tenantCache,
and isMTReady()).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0cef962c-d299-4f93-8f68-390aa3201dc5
📒 Files selected for processing (35)
CLAUDE.mdcomponents/ledger/.env.examplecomponents/ledger/internal/adapters/postgres/balance/balance.postgresql.gocomponents/ledger/internal/adapters/postgres/balance/balance.postgresql_integration_test.gocomponents/ledger/internal/adapters/postgres/balance/balance.postgresql_mock.gocomponents/ledger/internal/adapters/redis/transaction/balance/aggregation.gocomponents/ledger/internal/adapters/redis/transaction/balance/aggregation_test.gocomponents/ledger/internal/adapters/redis/transaction/consumer.redis.gocomponents/ledger/internal/adapters/redis/transaction/consumer.redis_atomic_state_test.gocomponents/ledger/internal/adapters/redis/transaction/consumer.redis_mock.gocomponents/ledger/internal/adapters/redis/transaction/consumer.redis_test.gocomponents/ledger/internal/adapters/redis/transaction/scripts/claim_balance_sync_keys.luacomponents/ledger/internal/adapters/redis/transaction/scripts/unschedule_synced_balance.luacomponents/ledger/internal/bootstrap/balance.worker.gocomponents/ledger/internal/bootstrap/balance.worker_test.gocomponents/ledger/internal/bootstrap/balance_sync.collector.gocomponents/ledger/internal/bootstrap/balance_sync.collector_test.gocomponents/ledger/internal/bootstrap/balance_sync.mt_lifecycle_test.gocomponents/ledger/internal/bootstrap/balance_sync.worker.gocomponents/ledger/internal/bootstrap/balance_sync.worker_integration_test.gocomponents/ledger/internal/bootstrap/balance_sync.worker_mt_fuzz_test.gocomponents/ledger/internal/bootstrap/balance_sync.worker_mt_property_test.gocomponents/ledger/internal/bootstrap/balance_sync.worker_mt_test.gocomponents/ledger/internal/bootstrap/balance_sync.worker_test.gocomponents/ledger/internal/bootstrap/config.gocomponents/ledger/internal/bootstrap/redis_consumer.worker_mt_fuzz_test.gocomponents/ledger/internal/bootstrap/redis_consumer.worker_mt_property_test.gocomponents/ledger/internal/bootstrap/redis_consumer.worker_mt_test.gocomponents/ledger/internal/bootstrap/workers_multitenant_property_test.gocomponents/ledger/internal/bootstrap/workers_multitenant_test.gocomponents/ledger/internal/services/command/sync-balance.gocomponents/ledger/internal/services/command/sync-balance_test.gocomponents/ledger/internal/services/command/sync_balances_batch.gocomponents/ledger/internal/services/command/sync_balances_batch_test.godocs/PROJECT_RULES.md
💤 Files with no reviewable changes (10)
- components/ledger/.env.example
- components/ledger/internal/adapters/redis/transaction/consumer.redis_atomic_state_test.go
- components/ledger/internal/adapters/redis/transaction/scripts/unschedule_synced_balance.lua
- components/ledger/internal/adapters/redis/transaction/consumer.redis_mock.go
- components/ledger/internal/services/command/sync-balance.go
- components/ledger/internal/bootstrap/workers_multitenant_test.go
- components/ledger/internal/bootstrap/balance.worker_test.go
- components/ledger/internal/bootstrap/workers_multitenant_property_test.go
- components/ledger/internal/services/command/sync-balance_test.go
- components/ledger/internal/bootstrap/balance.worker.go
components/ledger/internal/adapters/postgres/balance/balance.postgresql.go
Show resolved
Hide resolved
components/ledger/internal/bootstrap/balance_sync.worker_integration_test.go
Show resolved
Hide resolved
components/ledger/internal/bootstrap/balance_sync.worker_mt_fuzz_test.go
Outdated
Show resolved
Hide resolved
components/ledger/internal/bootstrap/balance_sync.worker_test.go
Outdated
Show resolved
Hide resolved
|
This PR is very large (35 files, 6698 lines changed). Consider breaking it into smaller PRs for easier review. |
This comment has been minimized.
This comment has been minimized.
|
This PR is very large (35 files, 6700 lines changed). Consider breaking it into smaller PRs for easier review. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/ledger/internal/bootstrap/balance_sync.collector_test.go (1)
753-759:⚠️ Potential issue | 🟡 MinorReplace sleep-based synchronization in idle-resume test.
time.Sleepat Line 758 can make this test timing-sensitive in CI. Prefer waiting on a deterministic signal (fetchCallCountor a channel).Proposed deterministic wait
// Wait for collector to reach idle and call waitForNext - time.Sleep(200 * time.Millisecond) + require.Eventually(t, func() bool { + return fetchCallCount.Load() >= 1 + }, 3*time.Second, 10*time.Millisecond, "collector should perform initial fetch before resume")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/ledger/internal/bootstrap/balance_sync.collector_test.go` around lines 753 - 759, The test uses time.Sleep to wait for the collector to become idle; replace this flaky sleep with a deterministic wait by blocking on a signal or counter used in the test (e.g., wait for fetchCallCount to reach the expected value or receive from waitCh) after starting c.Run(ctx, rec.record, fetchFn, waitForNextResume(waitCh)). Specifically, remove the time.Sleep(200 * time.Millisecond) and instead wait until the fetchCallCount (or the channel waitCh) indicates the fetch has been invoked the expected number of times (with a small context/timeout to avoid hanging); keep the existing c.Run/close(done) semantics and use the same test-level timeout to fail fast in CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@components/ledger/internal/bootstrap/balance_sync.collector_test.go`:
- Around line 753-759: The test uses time.Sleep to wait for the collector to
become idle; replace this flaky sleep with a deterministic wait by blocking on a
signal or counter used in the test (e.g., wait for fetchCallCount to reach the
expected value or receive from waitCh) after starting c.Run(ctx, rec.record,
fetchFn, waitForNextResume(waitCh)). Specifically, remove the time.Sleep(200 *
time.Millisecond) and instead wait until the fetchCallCount (or the channel
waitCh) indicates the fetch has been invoked the expected number of times (with
a small context/timeout to avoid hanging); keep the existing c.Run/close(done)
semantics and use the same test-level timeout to fail fast in CI.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ea46e4cd-e2ec-42d6-8ab7-953c34e50bd9
📒 Files selected for processing (1)
components/ledger/internal/bootstrap/balance_sync.collector_test.go
gandalf-at-lerian
left a comment
There was a problem hiding this comment.
Solid refactor. +2501/−4199 across 35 files and it reads cleaner than the original — that's the mark of good work.
What's good:
- The
UPDATE ... FROM (VALUES ...)inUpdateManyis a real performance win. N round-trips → 1. The client-side dedup by balance ID before the query is correctly motivated — without it, PG would join one target row to multiple source rows non-deterministically. - Removing
Sync(individual),RemoveBalanceSyncKey, andunschedule_synced_balance.luain favor of batch-only paths simplifies the mental model. One path to understand instead of two. - The collector API change (flush callback as a
Run()parameter instead ofSetFlushCallback) eliminates the footgun of forgetting to set it. Good call. - Structured logging migration from
fmt.Sprintf→ typed fields is the right pattern for production. The demotion of idle/polling logs to DEBUG will reduce noise significantly. - Poison record cleanup is belt-and-suspenders: worker catches unparseable keys in
groupKeysByOrgLedger, use case catches expired/unparseable inorphanedKeys. Both paths ZREM. - MT reconciliation with 4 phases (reap dead → stop removed → start new → wait for removed) is clean. Non-blocking cancel + batched wait is the right pattern.
extractIDsFromMember— allocation-free UUID scanner, position-independent. Neat.- Property tests and fuzz tests for the new paths are a welcome addition.
Minor nits (non-blocking):
parseSyncKeysFromLuaResult(res any, logger libLog.Logger, ctx context.Context)—ctxshould be the first parameter per Go convention and your own PROJECT_RULES.md (Context: Always first param). Swap to(ctx context.Context, res any, logger libLog.Logger).claim_balance_sync_keys.luais missing a trailing newline (return claimedwith no \n). Most editors and linters prefer POSIX-compliant final newline.
Both are trivial — not worth holding the PR for.
Summary
Comprehensive review and modernization of the balance sync pipeline (worker, collector, Redis adapter, PostgreSQL repository, and use case).
Changes
Legacy code removal (-1000+ lines)
waitForNextOrBackoff,waitUntilDue,processBalancesToExpire,processBalanceToExpire,shouldShutdown)Sync/SyncBalancemethods (replaced byUpdateMany/SyncBalancesBatch)RemoveBalanceSyncKeyandunschedule_synced_balance.luain favor of batch-only conditional removalredisConn,batchSize,maxWorkers) and deprecatedBALANCE_SYNC_MAX_WORKERSenv varSize()method from collectorPerformance
UPDATE ... FROM (VALUES ...)statement inUpdateMany(1 round-trip vs N)UpdateManyto prevent version mismatch when the same ID appears multiple timesObservability
fmt.Sprintflog calls to structured fields (libLog.Err,libLog.String,libLog.Int) across worker, collector, Redis adapter, and use caseNaming & conventions
runWorkerMT,NewBalanceSyncWorkerMT,isMTReady,mtEnabled)balance_sync.worker.go,balance_sync.collector.go)get_balances_near_expiration.lua→claim_balance_sync_keys.luaprocessBalancesToExpireBatch→processSyncBatch,SyncBatch→UpdateManyInMemorySyncAggregator,SyncAggregator)API cleanup
SetFlushCallbackintoRun()as required parameteridleBackoffdead field from collector constructorconnandmaxWorkersparams)FlushTimeout()/PollInterval()duration methods toBalanceSyncConfigDocumentation
CLAUDE.mdandPROJECT_RULES.mdwith current conventions (file naming, structured logging, MT terminology, balance sync worker description)stopAndDraintimer pattern,context.WithoutCancelin shutdown flush, etc.)RedisRepositoryinterface with doc comments on all 16 methodsTests
parseSyncKeysFromLuaResultresilience (bad score, odd elements, unexpected types)time.Sleepwith channel sync, added iteration index to property test subtestsbalance_sync.{type}_test.goconvention