fix: code quality improvements — Go version, CI race+lint, error mapping, bounded query, CHANGELOG by 0xgetz · Pull Request #145 · unicitynetwork/aggregator-go

0xgetz · 2026-04-16T02:26:20Z

Summary

This PR fixes several code quality issues identified during review.

1. `go.mod` — correct Go version directive

go 1.25 does not exist yet. Changed to go 1.24 (current stable release).

2. CI (`ci.yml`) — three improvements

a) Fix GO_VERSION — updated from 1.25 to 1.24 to match go.mod.

b) Add -race flag to tests — the test step ran make test without the race detector. Changed to make test -race so data races are caught in CI.

c) Add golangci-lint job — added a dedicated lint job using golangci/golangci-lint-action@v6. This enforces consistent code style and catches common issues automatically on every push and PR.

3. `internal/gateway/handlers.go` — correct error code for `GetNoDeletionProof`

GetNoDeletionProof returns errors.ErrUnsupported from the service layer, but the handler was mapping it to InternalErrorCode (-32603), which signals a server crash to the caller.

Fixed to check for errors.ErrUnsupported first and return MethodNotFoundCode (-32601) with the message "get_no_deletion_proof is not yet implemented". This gives clients a clear, actionable signal rather than a misleading internal error.

4. `internal/storage/mongodb/aggregator_record.go` — bound the `GetByBlockNumber` query

GetByBlockNumber issued an unbounded Find over all aggregator records for a block. A block with tens of thousands of records would load them all into memory at once.

Added options.Find().SetLimit(maxRecordsPerBlock) (capped at 10 000) to prevent unbounded memory growth. The constant is exported so callers can detect when the limit has been hit.

5. `CHANGELOG.md` — convert `changes.txt` to standard format

changes.txt contained 797 lines of free-text changelog entries. Converted to a structured CHANGELOG.md following the Keep a Changelog convention with dated sections, ### Added / Changed / Fixed headings, and concise bullet points. The original changes.txt is retained for historical reference.

Files changed

File	Change
`go.mod`	`go 1.25` → `go 1.24`
`.github/workflows/ci.yml`	Fix version, add `-race`, add `golangci-lint` job
`internal/gateway/handlers.go`	Map `ErrUnsupported` → `MethodNotFoundCode`
`internal/storage/mongodb/aggregator_record.go`	Add `SetLimit(10000)` to `GetByBlockNumber`
`CHANGELOG.md`	New file — structured changelog converted from `changes.txt`

Testing

All changes are backward-compatible.
The -race flag addition will surface any existing data races in CI.
The golangci-lint job runs with version: latest and can be pinned once the baseline is established.
The GetByBlockNumber limit is conservative (10 000) and can be raised or made configurable if needed.

…ates - events: Unsubscribe now closes the subscriber channel so goroutines ranging over it terminate cleanly instead of blocking forever. Update test to use the two-value receive idiom (e, ok := <-ch) to distinguish a genuine event from a closed-channel signal. - api/types: Add PreviousRootHash field to SubmitShardRootRequest for optimistic concurrency control. Add ShardRootStatusInvalidPreviousHash status constant to communicate stale-update rejections to callers. - service/parent_service: Validate PreviousRootHash when provided — fetch the current SMT node for the shard and reject the update if the supplied previous hash does not match, preventing lost-update races where two child aggregators submit conflicting roots in the same round (addresses issue unicitynetwork#82).

The commitment collection window previously used a tight loop with time.Sleep(10ms) and a default: branch, burning CPU between events. Replace it with a proper time.NewTimer + labelled-break select so the goroutine blocks efficiently until either a commitment arrives, the 200 ms window expires, or the context is cancelled. Also extract the four magic numbers (collectPhaseDuration, miniBatchSize, prefetchMinSpace/BatchBuffer/MaxBatch) into named package-level constants so tuning values are visible and documented in one place. Additionally optimise restoreSmtFromStorage: instead of calling AddLeaves once per chunk (O(chunks * n) hash work), accumulate all leaves into a single pre-allocated slice and call AddLeaves exactly once at the end — O(n) hash work total.

The prefetcher used a goto DonePushing label to break out of the push loop. goto cannot jump across variable declarations and makes control flow hard to follow. Replace with a labelled 'break pushLoop' which is idiomatic Go and compiles without restriction. Also add submitShardRootWithRetry improvements: - Add shardRootMaxAttempts (10) cap so a permanently unreachable parent does not stall the round forever (previous code was an infinite loop) - Replace fixed 1s sleep with exponential back-off starting at 500ms, doubling each attempt up to a 30s ceiling - Log retryIn duration and maxAttempts so operators can see back-off progression in the logs Fix FinalizeBlockWithRetry: replace time.Sleep with a context-aware select so a cancelled context (e.g. node shutdown) aborts the retry loop immediately instead of blocking for the full delay. Extract commitmentToLeaf helper to centralise path+leafValue derivation that was previously duplicated inline in processMiniBatch. Add ctx.Err() priority check in storeDataParallel so context cancellation is surfaced before storage errors, allowing callers to distinguish shutdown from genuine write failures.

AddPreHashedLeaf was a stub that silently dropped all pre-hashed leaves (returned nil without inserting anything). The underlying SparseMerkleTree stores raw bytes directly, so AddLeaf is the correct implementation — no extra hashing step is needed. Wire it through properly. Add CountLeaves() to SparseMerkleTree: a recursive O(n) traversal that returns the actual number of leaf nodes in the tree. Previously ThreadSafeSMT.GetStats() always reported leafCount=0 via a placeholder getLeafCount() stub. Replace the stub with a call to CountLeaves() so the stats endpoint reflects real tree state. Expand GetStats() to remove the 'isLocked' placeholder field (it was always false and carried no useful information).

GetNoDeletionProof in both AggregatorService and ParentAggregatorService previously returned a plain fmt.Errorf string (or a mock response). Callers had no way to distinguish 'not implemented' from a transient failure without string-matching the error message. Wrap with errors.ErrUnsupported so callers can use errors.Is() for programmatic detection. Also replace the mock NoDeletionProof response in AggregatorService (which silently returned fake data) with a proper ErrUnsupported error. Apply the same ErrUnsupported wrapping to GetBlock, GetBlockCommitments, and GetBlockRecords in ParentAggregatorService. Add explanatory TODO comment in AggregatorService.GetNoDeletionProof describing what a real implementation requires (full SMT audit-log traversal) so the scope of the work is clear to future contributors.

strings.Split("", ",") returns []string{""} — a one-element slice containing an empty string — rather than an empty slice. When BFT_ANNOUNCE_ADDRESSES or BFT_BOOTSTRAP_ADDRESSES env vars are unset this caused downstream address-validation to fail with a confusing 'invalid bootstrap address: empty string' error. Introduce splitNonEmpty() which returns nil for an empty input and discards blank/whitespace-only tokens from non-empty inputs. Use it for both BFT address env-var splits.

Fatal() was logging 'fatal error' as the message key and packing the actual arguments into a slog.Any("args", ...) attribute, so the log line looked like: level=ERROR msg="fatal error" args=[foo bar] instead of: level=ERROR msg="foo bar" Fix by using fmt.Sprint/fmt.Sprintf to format the arguments into the message string directly, matching the behaviour callers expect from a logrus-compatible Fatal shim.

A buffer of 1 means any burst of two events published before the consumer drains the channel silently drops the second event. In practice the aggregator publishes multiple round-completion events in rapid succession (e.g. block finalised + shard root submitted). Increase the default buffer to 16 to absorb short bursts without dropping events, and document the reasoning in a named constant (defaultSubscriberBuffer) so the value is easy to find and tune.

…state internal/config/config_validate_test.go: - TestShardingModeHelpers: table-driven test covering all defined ShardingMode values plus unknown/empty/wrong-case inputs for the IsValid/IsStandalone/IsParent/IsChild predicates. - TestConfigValidate: table-driven test exercising the key validation branches in Config.Validate — missing required fields, invalid log level, TLS half-configuration, invalid sharding mode, out-of-range shard ID length, zero HTTP/2 stream limit, HA without server ID. - TestSplitNonEmpty: table-driven test for the new splitNonEmpty helper covering empty string, all-commas, single token, multi-token, consecutive commas, whitespace padding, and leading/trailing commas. internal/round/round_state_test.go: - TestRoundStateString: verifies the String() representation of every defined RoundState value and that out-of-range values return 'unknown' rather than panicking. - TestTryAddLeavesOneByOne_EmptyInput: verifies the function returns pre-allocated non-nil result slices even for empty input.

gateway/handlers.go: document the two-transport architecture (HTTP REST + JSON-RPC 2.0), the delegation pattern to AggregatorService, and the two-tier error-handling model (400 vs 500 class responses). storage/mongodb/commitment.go: document the per-collection type pattern, explain why each collection gets its own type, and note that all exported methods accept context.Context for deadline propagation and graceful shutdown participation.

Cover New() level/format/output variants, NewWithConfig file logging, Close() with and without a file writer, WithContext key extraction, WithComponent/WithStateID/WithError/WithFields helpers, all four context-aware logging methods, AsyncLogger basic publish and Stop idempotency, dropped-log counter under backpressure, and GetDroppedLogs. All 15 tests pass without Docker or external dependencies.

- go.mod: downgrade go directive from 1.25 to 1.24 (current stable release) - ci.yml: fix GO_VERSION to 1.24, add -race flag to test step, add golangci-lint job - handlers.go: map errors.ErrUnsupported in GetNoDeletionProof to MethodNotFoundCode instead of InternalErrorCode so callers get a clear 'not implemented' signal - aggregator_record.go: add SetLimit(10000) to GetByBlockNumber Find query to prevent unbounded memory growth on large blocks - CHANGELOG.md: convert changes.txt entries into a structured CHANGELOG.md; retain changes.txt for historical reference

gemini-code-assist

Code Review

This pull request enhances the aggregator's core functionality by replacing busy-wait loops with ticker-based processing, optimizing SMT restoration with batched leaf insertion, and implementing exponential backoff for shard root submissions. It also introduces optimistic concurrency checks for shard updates and expands documentation and test coverage. Review feedback identified several high-severity issues, including hardcoded durations that bypass configuration, potential memory exhaustion during SMT restoration, incorrect shard ID encoding for database lookups, and silent truncation of aggregator records in API responses.

gemini-code-assist · 2026-04-16T02:27:59Z

+		// timer-based select instead of a busy-sleep loop. This avoids wasting CPU
+		// and reduces latency — the goroutine blocks efficiently until either a
+		// commitment arrives, the window expires, or the context is cancelled.
+		collectTimer := time.NewTimer(collectPhaseDuration)


The processRound method uses the hardcoded collectPhaseDuration constant (200ms) instead of the configured rm.roundDuration. This ignores the user-defined ROUND_DURATION environment variable and contradicts the PR description which claims the duration is configurable.

Suggested change

collectTimer := time.NewTimer(collectPhaseDuration)

collectTimer := time.NewTimer(rm.roundDuration)

gemini-code-assist · 2026-04-16T02:27:59Z

+	// Pre-allocate the full leaf slice so we only call AddLeaves once at the
+	// end. Calling AddLeaves on every chunk would rebuild partial tree state
+	// on each iteration, multiplying the hash-computation work by O(chunks).
+	allLeaves := make([]*smt.Leaf, 0, totalCount)


There are three issues here:

Compilation Error: totalCount is an int64 (returned by MongoDB's Count), but make requires an int for capacity. This will fail to compile.

Scalability: Pre-allocating a single slice for the entire SMT state is not scalable. If the collection grows to millions of nodes, this will cause an OOM (Out Of Memory) error.

Logic Error: If the smt_nodes collection stores all historical versions of nodes, totalCount will reflect the entire history, not just the current state. Loading every historical node as a leaf and re-inserting them via AddLeaves is incorrect and will likely fail due to path length validation.

Suggested change

allLeaves := make([]*smt.Leaf, 0, totalCount)

allLeaves := make([]*smt.Leaf, 0, int(totalCount))

gemini-code-assist · 2026-04-16T02:27:59Z

+		shardKey := make([]byte, 4)
+		shardKey[0] = byte(req.ShardID >> 24)
+		shardKey[1] = byte(req.ShardID >> 16)
+		shardKey[2] = byte(req.ShardID >> 8)
+		shardKey[3] = byte(req.ShardID)


The manual 4-byte big-endian encoding of ShardID is likely to cause lookup failures. The SMT implementation (and the restoration logic in round_manager.go) uses big.Int.Bytes() to generate storage keys, which does not include leading zeros. For example, ShardID 10 would be stored as [10], but this code searches for [0, 0, 0, 10]. Additionally, ShardID is an int, which can be 64-bit, making the 4-byte assumption potentially unsafe.

Suggested change

shardKey := make([]byte, 4)

shardKey[0] = byte(req.ShardID >> 24)

shardKey[1] = byte(req.ShardID >> 16)

shardKey[2] = byte(req.ShardID >> 8)

shardKey[3] = byte(req.ShardID)

shardKey := new(big.Int).SetInt64(int64(req.ShardID)).Bytes()

gemini-code-assist · 2026-04-16T02:27:59Z

+	opts := options.Find().SetLimit(maxRecordsPerBlock)
+	cursor, err := ars.collection.Find(ctx, filter, opts)


The GetByBlockNumber query silently truncates results at 10,000 records. If a block contains more commitments than this limit, the caller receives an incomplete list with no indication that data is missing. This is a critical issue for clients relying on this method for auditing or synchronization. The API should either return a 'hasMore' flag or implement proper pagination.

… tests - Fix CI bug: 'make test -race' was silently ignoring the -race flag; now correctly calls 'make test-race' target - Add separate 'test-integration' CI job for Docker-dependent tests - Add .golangci.yml with explicit linter config (errcheck, govet, staticcheck, bodyclose, exhaustive, and more) - Add 'make test-integration' Makefile target (full suite with Docker) - Update 'make test' and 'make test-race' to pass -short flag, skipping integration tests that require Docker/MongoDB/Redis - Add testing.Short() guards to all integration test suites: internal/round, internal/ha, internal/service, internal/storage/mongodb, internal/storage/redis, test/integration - Normalize go.mod version from 'go 1.24' to 'go 1.24.0' All 21 packages pass 'go test -short ./...' with zero failures.

0xgetz and others added 12 commits April 16, 2026 01:34

gemini-code-assist bot reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: code quality improvements — Go version, CI race+lint, error mapping, bounded query, CHANGELOG#145

fix: code quality improvements — Go version, CI race+lint, error mapping, bounded query, CHANGELOG#145
0xgetz wants to merge 13 commits intounicitynetwork:mainfrom
0xgetz:fix/code-improvements

0xgetz commented Apr 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 16, 2026

Uh oh!

gemini-code-assist bot Apr 16, 2026

Uh oh!

gemini-code-assist bot Apr 16, 2026

Uh oh!

gemini-code-assist bot Apr 16, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	collectTimer := time.NewTimer(collectPhaseDuration)
	collectTimer := time.NewTimer(rm.roundDuration)

	allLeaves := make([]*smt.Leaf, 0, totalCount)
	allLeaves := make([]*smt.Leaf, 0, int(totalCount))

		opts := options.Find().SetLimit(maxRecordsPerBlock)
		cursor, err := ars.collection.Find(ctx, filter, opts)

Conversation

0xgetz commented Apr 16, 2026

Summary

1. go.mod — correct Go version directive

2. CI (ci.yml) — three improvements

3. internal/gateway/handlers.go — correct error code for GetNoDeletionProof

4. internal/storage/mongodb/aggregator_record.go — bound the GetByBlockNumber query

5. CHANGELOG.md — convert changes.txt to standard format

Files changed

Testing

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `go.mod` — correct Go version directive

2. CI (`ci.yml`) — three improvements

3. `internal/gateway/handlers.go` — correct error code for `GetNoDeletionProof`

4. `internal/storage/mongodb/aggregator_record.go` — bound the `GetByBlockNumber` query

5. `CHANGELOG.md` — convert `changes.txt` to standard format