Skip to content

Review fixes#13

Open
lploom wants to merge 8 commits intoipc-clientfrom
review-fixes
Open

Review fixes#13
lploom wants to merge 8 commits intoipc-clientfrom
review-fixes

Conversation

@lploom
Copy link
Copy Markdown
Collaborator

@lploom lploom commented Apr 16, 2026

Fixes #1 #2 #3 #4 #5 #7 #8 #10 #12

Requires unicity-node branch "bft-snapshots".

lploom added 8 commits April 15, 2026 15:59
Handle replication requests on dedicated replication thread, this prevents possible goroutine exhaustion attack vector and cleanly handles shutdown.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several improvements to resource management, security, and the replication protocol. Key changes include adding defer Close() calls for database handles and network streams, implementing a 16MB maximum message size limit in CBOR deserialization to prevent potential DoS, and refactoring ledger replication to use a single-worker queue instead of unbounded goroutines. Additionally, the FGPTxSystem was updated with stricter configuration validation and sanity checks for chain tips. Feedback suggests that the 16MB message limit might be too restrictive for the default replication block count, the replication worker queue capacity could be increased to avoid frequent busy responses, and the block slice allocation in the recovery logic should be pre-allocated for better performance.

Comment thread network/cbor.go
return append(lengthBytes[:bytesWritten], data...), nil
}

const maxMsgSize = 16 * 1024 * 1024 // 16 MB
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The maxMsgSize of 16MB might be too restrictive when combined with the default maxReturnBlocks of 1000 in ledger replication. If blocks are larger than ~16KB on average, the resulting LedgerReplicationResponse will exceed this limit and be rejected by the receiver. Consider either increasing this limit or ensuring that the replication logic can handle splitting large responses.

Comment thread partition/recovery.go

func (n *Node) processReplicationRequest(ctx context.Context, lr *replication.LedgerReplicationRequest) {
startBlock := lr.BeginBlockNumber
blocks := make([]*types.Block, 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is more efficient to pre-allocate the blocks slice with the maximum expected capacity to avoid multiple re-allocations during the loop.

Suggested change
blocks := make([]*types.Block, 0)
blocks := make([]*types.Block, 0, n.conf.replicationConfig.maxReturnBlocks)

Comment thread partition/recovery.go
Comment on lines +131 to +141
select {
case n.replicationCh <- replicationRequest{ctx: ctx, req: lr}:
// worker will process
default:
resp := &replication.LedgerReplicationResponse{
UUID: lr.UUID,
Status: replication.Ok,
Blocks: blocks,
FirstBlockNumber: firstFetchedBlockNumber,
LastBlockNumber: lastFetchedBlockNumber,
UUID: lr.UUID,
Status: replication.Busy,
Message: "Too many concurrent replication requests",
}
return n.sendLedgerReplicationResponse(ctx, resp, lr.NodeID)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The replicationCh has a capacity of 1, meaning only one replication request can be queued while another is being processed. In a network with many nodes, this might lead to frequent Busy responses during recovery. Consider increasing the channel capacity or implementing a more flexible worker pool if high concurrency is expected.

Copy link
Copy Markdown
Member

@MastaP MastaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most referenced issues look properly fixed. Flagging two sub-items from #2 and #5 that weren't addressed — leaving inline comments so they can either be taken in this PR or split out.

Comment thread network/libp2p_network.go
if stream != nil {
stream.Close()
}
}()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #2 mentioned a second bug alongside the stream leak: SendMsgs still return nils at the end of the function (line 140), discarding any resErr accumulated when serialization of an individual message fails. Should be return resErr so the caller sees partial-failure errors.

Comment thread txsystem/fgp_txsystem.go
if tip.Height < block.Height {
return nil, fmt.Errorf("proposed block height %d is ahead of tip height %d", block.Height, tip.Height)
}
if tip.Height-block.Height < s.dFG-1 {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #5's Related note points out that FollowerVerify is missing two checks that LeaderPropose performs:

  1. No check that block.Height > s.committedStateHeight — a follower can accept a proposal at or below the last finalized height.
  2. No fork-safety check via GetChainTipsLeaderPropose verifies no competing fork is visible at the candidate height, but a follower does not repeat that verification against its own PoW view.

Both weaken safety guarantees vs. what the leader enforces. Worth either adding here or tracking as a follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment