Skip to content

Conversation

@TheMarstonConnell
Copy link
Member

@TheMarstonConnell TheMarstonConnell commented Jan 2, 2026

Implement automatic failover capability for multiple RPC/GRPC nodes:

  • Add RPCAddrs/GRPCAddrs fields to ChainConfig with backwards compatibility
  • Create new rpc/failover.go package with FailoverClient wrapper
  • FailoverClient automatically switches to next available node on connection errors
  • Add IsConnectionError() helper to detect connection failures
  • Update all components to use FailoverClient instead of direct wallet

The failover state is kept in memory only. When a connection error is detected, the system automatically attempts to connect to the next configured node in the list.

Configuration example:
chain:
rpc_addrs:
- http://node1:26657
- http://node2:26657 grpc_addrs: - node1:9090 - node2:9090

Summary by CodeRabbit

  • New Features

    • Added a failover RPC client with multi-node RPC/GRPC support, health checks and automatic failover.
  • Improvements

    • Core services, API endpoints, CLI, monitoring, proof generation, queueing and stray management now use the failover client for more reliable connectivity.
    • Automatic retries on connection errors, improved mempool/broadcast handling, and multi-address configuration for RPC/GRPC redundancy.

✏️ Tip: You can customize this high-level summary in your review settings.

Implement automatic failover capability for multiple RPC/GRPC nodes:

- Add RPCAddrs/GRPCAddrs fields to ChainConfig with backwards compatibility
- Create new rpc/failover.go package with FailoverClient wrapper
- FailoverClient automatically switches to next available node on connection errors
- Add IsConnectionError() helper to detect connection failures
- Update all components to use FailoverClient instead of direct wallet

The failover state is kept in memory only. When a connection error is
detected, the system automatically attempts to connect to the next
configured node in the list.

Configuration example:
  chain:
    rpc_addrs:
      - http://node1:26657
      - http://node2:26657
    grpc_addrs:
      - node1:9090
      - node2:9090
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 2, 2026

📝 Walkthrough

Walkthrough

The diff replaces direct usage of the local wallet type with a new rpc.FailoverClient across the codebase. Function signatures, struct fields, and handler registrations were updated to accept or store *rpc.FailoverClient and to call its accessors (Wallet(), GRPCConn(), RPCClient(), AccAddress()). A new rpc/failover.go implements NodeConfig and FailoverClient with health checks, connection/error detection, and automatic failover across RPC/GRPC nodes, plus generic QueryWithFailover/ExecuteWithFailover helpers. Config types gained RPCAddrs/GRPCAddrs and accessor methods. Changes propagate failover-aware behavior through api, core, proofs, queue, monitoring, strays, cmd, and related packages, including added connection-error detection and retry paths around chain queries and broadcasts.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.62% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add failover support for multiple RPC nodes' accurately and concisely describes the primary change in this PR, which introduces a FailoverClient mechanism enabling automatic failover across multiple configured RPC/gRPC nodes.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7d28ec4 and eb285bb.

📒 Files selected for processing (2)
  • api/client.go
  • rpc/failover.go
🧰 Additional context used
🧬 Code graph analysis (2)
api/client.go (2)
rpc/failover.go (1)
  • IsConnectionError (274-305)
api/types/responses.go (1)
  • ErrorResponse (27-29)
rpc/failover.go (2)
config/types.go (1)
  • ChainConfig (17-25)
wallet/wallet.go (2)
  • CreateWalletPrivKey (39-61)
  • CreateWallet (14-37)
🔇 Additional comments (9)
api/client.go (2)

22-42: LGTM! Failover handling properly implemented.

The Provider query now correctly detects connection errors, attempts failover, and retries with a new connection. The defensive error encoding prevents cascading failures.


49-69: LGTM! FreeSpace query failover handling is consistent.

The failover logic matches the Provider query implementation, ensuring consistent error handling across both queries.

rpc/failover.go (7)

1-57: LGTM! Well-structured types and error definitions.

The package structure is clean with clear error types and proper separation of concerns between NodeConfig (configuration) and FailoverClient (state + behavior). The use of sync.RWMutex ensures thread-safe access.


59-105: LGTM! Constructors include proper validation.

Both constructors now validate that RPCAddrs and GRPCAddrs have equal lengths (lines 65-67, 89-91), preventing potential out-of-bounds panics during failover operations. The validation properly returns ErrMismatchedNodeCounts when lengths differ.


107-149: LGTM! Connection logic is robust.

The createWalletAtIndex method includes defensive boundary checks (lines 132-134) that validate the index against both RPCAddrs and GRPCAddrs arrays, providing defense-in-depth. The connectToFirstAvailable method properly iterates through nodes and suppresses log spam by only warning after the first failure.


151-202: LGTM! Failover logic is thread-safe and comprehensive.

The Failover method properly acquires a write lock and attempts all nodes in sequence with wrap-around logic. The logging provides good observability without spam. The Wallet accessor correctly uses a read lock.


204-270: LGTM! Accessors include defensive nil checks.

All accessor methods now include appropriate nil checks (lines 209-211, 220-222, 231-233) and panic with ErrNilWallet when preconditions aren't met. This is documented in comments and appropriate for programming errors. The NodeCount method now properly uses a read lock (lines 267-268).


272-334: LGTM! Helper functions use standard library correctly.

The IsConnectionError function now uses strings.ToLower and strings.Contains from the standard library (lines 297-300), which is more efficient and handles Unicode correctly. The ExecuteWithFailover and QueryWithFailover functions provide clean wrappers for automatic retry logic.


346-363: LGTM! EnsureHealthy now re-verifies health after failover.

The method now re-checks health after a successful failover (lines 357-360) and logs appropriately if the new node is also unhealthy. This ensures the system is genuinely healthy after failover.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (10)
config/types.go (1)

171-179: Consider updating DefaultChainConfig to also populate the array fields.

The default config only sets RPCAddr and GRPCAddr. While the accessors handle this correctly via fallback logic, setting RPCAddrs and GRPCAddrs explicitly would make the defaults more self-documenting and reduce reliance on the fallback path.

🔎 Proposed change
 func DefaultChainConfig() ChainConfig {
 	return ChainConfig{
 		RPCAddr:       "http://localhost:26657",
 		GRPCAddr:      "localhost:9090",
+		RPCAddrs:      []string{"http://localhost:26657"},
+		GRPCAddrs:     []string{"localhost:9090"},
 		GasPrice:      "0.02ujkl",
 		GasAdjustment: 1.5,
 		Bech32Prefix:  "jkl",
 	}
 }
cmd/wallet/wallet.go (1)

108-128: Inconsistent variable naming and missing failover handling.

  1. The variable is named wallet but it's a *rpc.FailoverClient. Consider renaming to fc for consistency with balanceCMD.
  2. The broadcast operation doesn't use ExecuteWithFailover, so connection errors during withdrawal won't trigger automatic failover.
🔎 Proposed fix
-		wallet, err := config.InitWallet(home)
+		fc, err := config.InitWallet(home)
 		if err != nil {
 			return err
 		}
 		// ...
 		m := bankTypes.MsgSend{
-			FromAddress: wallet.AccAddress(),
+			FromAddress: fc.AccAddress(),
 			ToAddress:   args[0],
 			Amount:      sdk.NewCoins(c),
 		}
 		// ...
-		res, err := wallet.Wallet().BroadcastTxCommit(data)
+		var res *sdk.TxResponse
+		err = fc.ExecuteWithFailover(func() error {
+			var txErr error
+			res, txErr = fc.Wallet().BroadcastTxCommit(data)
+			return txErr
+		})
monitoring/monitor.go (1)

29-37: Consider adding failover handling in monitoring methods.

The monitoring methods (updateBurns, updateHeight, updateBalance) silently return on errors without triggering failover. Since these run continuously, connection errors could cause prolonged monitoring gaps without attempting recovery.

🔎 Example for updateHeight
 func (m *Monitor) updateHeight() {
 	abciInfo, err := m.wallet.RPCClient().ABCIInfo(context.Background())
 	if err != nil {
+		if rpc.IsConnectionError(err) {
+			m.wallet.Failover()
+		}
 		return
 	}
 	height := abciInfo.Response.LastBlockHeight

Note: This requires importing "github.com/JackalLabs/sequoia/rpc".

proofs/proofs.go (1)

111-116: Consider adding failover handling for gRPC query errors.

GenerateProof queries the file via gRPC but doesn't trigger failover on connection errors. This could leave the prover stuck on a failed node during proof generation.

🔎 Proposed fix
 	res, err := cl.File(context.Background(), queryParams)
 	if err != nil {
+		if rpc.IsConnectionError(err) {
+			p.wallet.Failover()
+		}
 		return nil, nil, 0, err
 	}
api/index.go (2)

29-49: Consider adding failover handling for connection errors.

The VersionHandler doesn't attempt failover when GetChainID() fails. While this is a read-only status endpoint, adding failover would improve resilience and consistency with other parts of the codebase.

🔎 Proposed enhancement
 func VersionHandler(fc *rpc.FailoverClient) func(http.ResponseWriter, *http.Request) {
 	return func(w http.ResponseWriter, req *http.Request) {
 		chainId, err := fc.Wallet().Client.GetChainID()
 		if err != nil {
+			if rpc.IsConnectionError(err) {
+				if fc.Failover() {
+					chainId, err = fc.Wallet().Client.GetChainID()
+				}
+			}
+		}
+		if err != nil {
 			w.WriteHeader(500)
 			return
 		}

51-72: Consider adding failover handling for the RPC status check.

Similar to VersionHandler, the NetworkHandler doesn't attempt failover when fc.RPCClient().Status() fails. Adding failover would be consistent with the rest of the codebase.

core/app.go (3)

121-147: Missing failover handling for BuildTx errors.

BuildTx (line 129) may fail due to connection issues when querying account information, but only BroadcastTxCommit errors trigger failover. Consider adding failover handling for BuildTx failures as well.

🔎 Proposed fix
 	builder, err := w.BuildTx(data)
 	if err != nil {
+		if rpc.IsConnectionError(err) {
+			if fc.Failover() {
+				return initProviderOnChain(fc, ip, totalSpace)
+			}
+		}
 		return err
 	}

149-203: Same BuildTx failover gap in updateSpace and updateIp.

Both functions have the same pattern where BuildTx errors don't trigger failover. Consider applying the same fix as suggested for initProviderOnChain.


327-336: Optional: Consider failover for IPFS peer discovery.

ConnectPeers doesn't attempt failover when ActiveProviders query fails. Since this is a background task, the current behavior (log and return) is acceptable, but adding failover could improve resilience.

api/file_handler.go (1)

88-92: Consider adding failover for chain query failures in file upload.

When cl.File() fails at line 88, there's no failover attempt. Adding failover handling could improve upload reliability when a node becomes unavailable mid-request.

🔎 Proposed fix
 		res, err := cl.File(context.Background(), &queryParams)
 		if err != nil {
+			if rpc.IsConnectionError(err) {
+				if fc.Failover() {
+					cl = storageTypes.NewQueryClient(fc.GRPCConn())
+					res, err = cl.File(context.Background(), &queryParams)
+				}
+			}
+		}
+		if err != nil {
 			handleErr(fmt.Errorf("failed to find file on chain with merkle: %x, owner: %s, start: %d | %w", merkle, sender, startBlock, err), w, http.StatusInternalServerError)
 			return
 		}
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 965e84f and 8d950eb.

📒 Files selected for processing (18)
  • api/client.go
  • api/file_handler.go
  • api/index.go
  • api/server.go
  • cmd/wallet/wallet.go
  • config/types.go
  • config/wallet.go
  • core/app.go
  • monitoring/monitor.go
  • monitoring/types.go
  • proofs/proofs.go
  • proofs/types.go
  • queue/queue.go
  • queue/types.go
  • rpc/failover.go
  • strays/hands.go
  • strays/manager.go
  • strays/types.go
🧰 Additional context used
🧬 Code graph analysis (14)
proofs/types.go (1)
rpc/failover.go (1)
  • FailoverClient (29-47)
monitoring/types.go (1)
rpc/failover.go (1)
  • FailoverClient (29-47)
api/index.go (1)
rpc/failover.go (1)
  • FailoverClient (29-47)
api/client.go (1)
rpc/failover.go (1)
  • FailoverClient (29-47)
cmd/wallet/wallet.go (1)
config/wallet.go (1)
  • InitWallet (80-123)
api/server.go (4)
proofs/types.go (2)
  • FileSystem (25-29)
  • Prover (12-23)
rpc/failover.go (1)
  • FailoverClient (29-47)
api/index.go (1)
  • VersionHandler (29-49)
api/client.go (1)
  • SpaceHandler (14-71)
strays/types.go (1)
rpc/failover.go (1)
  • FailoverClient (29-47)
queue/types.go (1)
rpc/failover.go (1)
  • FailoverClient (29-47)
config/wallet.go (1)
rpc/failover.go (4)
  • FailoverClient (29-47)
  • NodeConfig (18-24)
  • NewFailoverClientWithPrivKey (90-125)
  • NewFailoverClient (51-87)
strays/manager.go (3)
rpc/failover.go (2)
  • FailoverClient (29-47)
  • IsConnectionError (258-288)
queue/types.go (1)
  • Queue (12-22)
strays/types.go (1)
  • StrayManager (19-30)
proofs/proofs.go (3)
rpc/failover.go (2)
  • IsConnectionError (258-288)
  • FailoverClient (29-47)
queue/types.go (1)
  • Queue (12-22)
proofs/types.go (2)
  • FileSystem (25-29)
  • Prover (12-23)
api/file_handler.go (2)
proofs/types.go (2)
  • FileSystem (25-29)
  • Prover (12-23)
rpc/failover.go (1)
  • FailoverClient (29-47)
core/app.go (1)
rpc/failover.go (2)
  • FailoverClient (29-47)
  • IsConnectionError (258-288)
rpc/failover.go (2)
config/types.go (1)
  • ChainConfig (17-25)
wallet/wallet.go (2)
  • CreateWalletPrivKey (39-61)
  • CreateWallet (14-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: lint
  • GitHub Check: Build
  • GitHub Check: Test
  • GitHub Check: cov
🔇 Additional comments (25)
config/types.go (2)

27-38: LGTM! Backward-compatible accessor with sensible defaults.

The GetRPCAddrs() accessor correctly prioritizes the multi-address array over the single address field, maintaining backward compatibility for existing configurations.


40-51: LGTM! Consistent implementation with GetRPCAddrs().

The GetGRPCAddrs() accessor follows the same pattern as GetRPCAddrs(), ensuring consistency across RPC and gRPC address handling.

rpc/failover.go (2)

318-345: Failover helpers retry only once after failover.

ExecuteWithFailover and QueryWithFailover attempt failover once and then retry the operation once. If the new node also fails immediately, the error is returned without trying additional nodes. Consider whether a loop through all nodes would be more resilient.

Additionally, after a failed failover (when Failover() returns false), the original error is still returned, which is correct. However, it may be worth logging that all nodes were exhausted.


347-368: LGTM! Health check implementation is straightforward.

HealthCheck and EnsureHealthy provide a clean mechanism to proactively verify node health and trigger failover when needed.

cmd/wallet/wallet.go (1)

68-77: LGTM! Correctly uses FailoverClient accessors.

The balance command properly obtains the gRPC connection and account address through the FailoverClient interface.

proofs/types.go (1)

12-23: LGTM! Type change aligns with FailoverClient migration.

The Prover struct correctly updated to use *rpc.FailoverClient. The field name wallet still communicates the purpose, though fc or client would be more accurate given the type change.

strays/types.go (1)

12-17: LGTM! Appropriate separation of concerns.

StrayManager uses *rpc.FailoverClient for failover-aware operations, while Hand instances retain *wallet.Wallet since they receive cloned wallet instances. This design correctly separates failover management from per-hand wallet operations.

Also applies to: 19-30

strays/hands.go (1)

115-131: LGTM! Correct accessor chain for wallet cloning.

The change to s.wallet.Wallet().CloneWalletOffset(offset) correctly adapts to the FailoverClient abstraction by first retrieving the underlying wallet before invoking the clone method.

queue/types.go (1)

12-22: Connection error handling properly implemented in queue processing logic.

The Queue struct correctly updated to use *rpc.FailoverClient. Queue operations in queue/queue.go properly handle connection errors: RPCClient().UnconfirmedTxs() (line 177) and Wallet().BroadcastTxSync() (line 238) both check rpc.IsConnectionError() and trigger Failover() on connection failures, enabling automatic recovery from RPC node failures.

monitoring/types.go (1)

31-41: LGTM!

The type migration from *wallet.Wallet to *rpc.FailoverClient is clean and consistent with the broader refactoring. The constructor correctly initializes the new field type.

api/server.go (1)

50-84: LGTM!

The Serve method signature and all route registrations are correctly updated to use *rpc.FailoverClient. The handlers consistently receive the failover client for RPC/gRPC access.

proofs/proofs.go (1)

441-453: LGTM!

The NewProver constructor is correctly updated to accept *rpc.FailoverClient and properly initializes the prover with the failover-enabled wallet.

strays/manager.go (2)

186-190: LGTM on failover handling!

The connection error detection and failover triggering in RefreshList is well implemented, consistent with the pattern used in other files.


29-98: LGTM!

The NewStrayManager constructor is correctly updated to accept *rpc.FailoverClient and properly uses w.AccAddress() for the claimer message construction.

config/wallet.go (2)

93-104: LGTM!

The NodeConfig construction properly extracts configuration values including the new multi-address fields via GetRPCAddrs() and GetGRPCAddrs(). Both methods correctly handle backwards compatibility by checking the new multi-address fields first, then falling back to converting single-address fields into slice format ([]string{c.RPCAddr} and []string{c.GRPCAddr}), with sensible defaults for each protocol.


80-122: Signature change is a breaking API change.

The return type changed from *wallet.Wallet to *rpc.FailoverClient. All callers in the codebase have been updated appropriately to use the new return type and call the correct methods (AccAddress(), GRPCConn()).

queue/queue.go (3)

15-15: LGTM!

Import addition for the new rpc package to support FailoverClient integration.


74-95: LGTM!

Clean signature change to accept *rpc.FailoverClient. The queue initialization logic remains intact.


236-270: LGTM!

The broadcast failover handling is well-implemented:

  • Gets fresh wallet reference inside the loop via q.wallet.Wallet(), ensuring retries use the new connection after failover
  • Properly continues the retry loop when Failover() succeeds
  • Maintains existing error handling for cache duplicates, full mempool, and sequence mismatches
core/app.go (1)

51-51: LGTM!

Field type change from *wallet.Wallet to *rpc.FailoverClient aligns with the failover architecture.

api/file_handler.go (5)

47-47: LGTM!

Clean migration to FailoverClient for PostFileHandler. The function correctly uses fc.GRPCConn() for the query client.

Also applies to: 82-82


135-135: LGTM!

PostFileHandlerV2 properly uses fc.GRPCConn() for the query client. Same clean migration pattern as PostFileHandler.

Also applies to: 189-189


415-467: LGTM!

getMerkleData properly propagates fc for gRPC queries and network file fetching. The fallback to other providers when local data isn't available is a good resilience pattern.


472-519: LGTM!

GetMerklePathData correctly passes fc through recursive calls, ensuring consistent access to the FailoverClient throughout the path traversal.


524-596: LGTM!

FindFileHandler properly threads the fc parameter through to GetMerklePathData and getMerkleData calls.

Comment on lines +277 to 280
params, err := a.GetStorageParams(a.wallet.GRPCConn())
if err != nil {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Missing failover handling for GetStorageParams.

If GetStorageParams fails due to a connection error, there's no failover attempt. Consider adding connection error handling here.

🔎 Proposed fix
 	params, err := a.GetStorageParams(a.wallet.GRPCConn())
 	if err != nil {
+		if rpc.IsConnectionError(err) {
+			if a.wallet.Failover() {
+				params, err = a.GetStorageParams(a.wallet.GRPCConn())
+			}
+		}
+	}
+	if err != nil {
 		return err
 	}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In core/app.go around lines 277 to 280, calling
a.GetStorageParams(a.wallet.GRPCConn()) has no failover if the gRPC call fails;
add connection-error handling by detecting transient/connection errors,
attempting to re-establish the wallet gRPC connection (e.g. call the wallet
reconnect/init routine or recreate the GRPCConn), and retrying GetStorageParams
a few times with exponential backoff and logging each attempt; if all retries
fail, return the final error. Ensure retry count and backoff are configurable
and only retry on connection/transient errors, not on permanent parameter
errors.

Comment on lines +105 to 106
go hand.Start(f, s.wallet.Wallet(), q, myUrl, chunkSize)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find Hand.Start implementation to see how the wallet parameter is used
ast-grep --pattern $'func ($_ *Hand) Start($$$) {
  $$$
}'

Repository: JackalLabs/sequoia

Length of output: 3086


🏁 Script executed:

# Find network.DownloadFile signature and implementation
ast-grep --pattern $'func DownloadFile($$$) {
  $$$
}'

Repository: JackalLabs/sequoia

Length of output: 44


🏁 Script executed:

# Search for DownloadFile definition
rg "func DownloadFile" --type go -A 10

Repository: JackalLabs/sequoia

Length of output: 2096


🏁 Script executed:

# Check the manager.go file to see how FailoverClient is used
cat -n strays/manager.go | head -50

Repository: JackalLabs/sequoia

Length of output: 2083


🏁 Script executed:

# Check the area around line 105 in manager.go to see the hand.Start call
sed -n '100,120p' strays/manager.go

Repository: JackalLabs/sequoia

Length of output: 601


🏁 Script executed:

# Check the FailoverClient structure in rpc package
find . -name "*.go" -path "*/rpc/*" | head -5

Repository: JackalLabs/sequoia

Length of output: 79


🏁 Script executed:

# Check the FailoverClient structure and Wallet() method
cat -n rpc/failover.go | head -80

Repository: JackalLabs/sequoia

Length of output: 2661


🏁 Script executed:

# Check more of the FailoverClient, particularly the Wallet() method
cat -n rpc/failover.go | sed -n '80,150p'

Repository: JackalLabs/sequoia

Length of output: 2670


🏁 Script executed:

# Find the Wallet() method in FailoverClient
rg "func.*Wallet\(\)" rpc/failover.go -A 5

Repository: JackalLabs/sequoia

Length of output: 172


🏁 Script executed:

# Check all public methods of FailoverClient
rg "^func \(fc \*FailoverClient\)" rpc/failover.go | head -20

Repository: JackalLabs/sequoia

Length of output: 852


🏁 Script executed:

# Check the GRPCConn() method of FailoverClient
rg "func.*GRPCConn" rpc/failover.go -A 5

Repository: JackalLabs/sequoia

Length of output: 191


🏁 Script executed:

# Check the RefreshList method to see how it uses FailoverClient
rg "func.*RefreshList" strays/manager.go -A 30

Repository: JackalLabs/sequoia

Length of output: 863


Hands lack failover capability for network operations.

hand.Start() receives s.wallet.Wallet() (the underlying *wallet.Wallet), not the FailoverClient. This means when network.DownloadFile() executes a query and encounters a connection error, there is no failover mechanism to retry with an alternative node—unlike RefreshList() which explicitly checks for connection errors and calls s.wallet.Failover().

This creates an inconsistency where the manager handles failover for its own queries but hands do not. Consider passing the FailoverClient to hands instead, or implement explicit failover handling in DownloadFile().

- Replace custom contains/toLowerCase functions with standard library
  strings.Contains and strings.ToLower for better reliability
- Simplify IsConnectionError logic
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
rpc/failover.go (4)

50-88: Add validation for address slice lengths and non-empty requirement.

The constructor doesn't validate that RPCAddrs and GRPCAddrs have the same length or that at least one address pair is provided. This can lead to runtime panics in createWalletAtIndex or silent failures when lengths mismatch.

🔎 Proposed fix
 func NewFailoverClient(nodeCfg NodeConfig, seed, derivation string) (*FailoverClient, error) {
+	if len(nodeCfg.RPCAddrs) != len(nodeCfg.GRPCAddrs) {
+		return nil, fmt.Errorf("RPCAddrs and GRPCAddrs must have the same length: got %d and %d", 
+			len(nodeCfg.RPCAddrs), len(nodeCfg.GRPCAddrs))
+	}
+	if len(nodeCfg.RPCAddrs) == 0 {
+		return nil, fmt.Errorf("at least one RPC/GRPC address pair is required")
+	}
+
 	fc := &FailoverClient{

Based on past review comments.


50-126: Extract common initialization logic to reduce duplication.

NewFailoverClient and NewFailoverClientWithPrivKey share nearly identical retry logic (lines 64-78 and 102-116). Consider extracting the connection-retry loop into a private helper method to improve maintainability.

🔎 Proposed refactor
+// initializeConnection attempts to connect to nodes, trying each in sequence until successful.
+func (fc *FailoverClient) initializeConnection() error {
+	w, err := fc.createWalletAtIndex(0)
+	if err != nil {
+		for i := 1; i < len(fc.rpcAddrs); i++ {
+			w, err = fc.createWalletAtIndex(i)
+			if err == nil {
+				fc.currentIndex = i
+				break
+			}
+			log.Warn().Err(err).Int("index", i).Msg("Failed to connect to node, trying next")
+		}
+		if err != nil {
+			return err
+		}
+	}
+	fc.wallet = w
+	log.Info().
+		Int("node_index", fc.currentIndex).
+		Str("rpc", fc.rpcAddrs[fc.currentIndex]).
+		Str("grpc", fc.grpcAddrs[fc.currentIndex]).
+		Msg("Connected to blockchain node")
+	return nil
+}

Then simplify both constructors:

 func NewFailoverClient(nodeCfg NodeConfig, seed, derivation string) (*FailoverClient, error) {
 	fc := &FailoverClient{
 		nodeCfg:      nodeCfg,
 		seed:         seed,
 		derivation:   derivation,
 		useLegacyKey: false,
 		rpcAddrs:     nodeCfg.RPCAddrs,
 		grpcAddrs:    nodeCfg.GRPCAddrs,
 		currentIndex: 0,
 	}
-	// Create initial wallet connection
-	w, err := fc.createWalletAtIndex(0)
-	if err != nil {
-		// Try other nodes if the first one fails
-		for i := 1; i < len(fc.rpcAddrs); i++ {
-			w, err = fc.createWalletAtIndex(i)
-			if err == nil {
-				fc.currentIndex = i
-				break
-			}
-			log.Warn().Err(err).Int("index", i).Msg("Failed to connect to node, trying next")
-		}
-		if err != nil {
-			return nil, err
-		}
-	}
-	fc.wallet = w
-	log.Info().
-		Int("node_index", fc.currentIndex).
-		Str("rpc", fc.rpcAddrs[fc.currentIndex]).
-		Str("grpc", fc.grpcAddrs[fc.currentIndex]).
-		Msg("Connected to blockchain node")
-	return fc, nil
+	if err := fc.initializeConnection(); err != nil {
+		return nil, err
+	}
+	return fc, nil
 }

Based on past review comments.


128-147: Silent index reset may mask configuration errors.

When index >= len(fc.rpcAddrs) or index >= len(fc.grpcAddrs), the function silently resets to index 0 (lines 130-132). This could hide misconfiguration where rpcAddrs and grpcAddrs have different lengths. With the validation added in constructors, this should return an error for invalid indices instead.

🔎 Proposed fix
 func (fc *FailoverClient) createWalletAtIndex(index int) (*wallet.Wallet, error) {
-	if index >= len(fc.rpcAddrs) || index >= len(fc.grpcAddrs) {
-		index = 0
+	if index < 0 || index >= len(fc.rpcAddrs) || index >= len(fc.grpcAddrs) {
+		return nil, fmt.Errorf("invalid node index: %d", index)
 	}

Based on past review comments.


252-255: NodeCount() reads rpcAddrs without holding the lock.

While rpcAddrs is only written during construction, omitting the lock is inconsistent with other accessors and could cause issues if node addition/removal is added later. For consistency and future-proofing, acquire the read lock.

🔎 Proposed fix
 func (fc *FailoverClient) NodeCount() int {
+	fc.mu.RLock()
+	defer fc.mu.RUnlock()
 	return len(fc.rpcAddrs)
 }

Based on past review comments.

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8d950eb and 574d4eb.

📒 Files selected for processing (1)
  • rpc/failover.go
🧰 Additional context used
🧬 Code graph analysis (1)
rpc/failover.go (2)
config/types.go (1)
  • ChainConfig (17-25)
wallet/wallet.go (2)
  • CreateWalletPrivKey (39-61)
  • CreateWallet (14-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test
  • GitHub Check: cov
🔇 Additional comments (5)
rpc/failover.go (5)

149-193: LGTM! Failover logic is well-implemented.

The method correctly:

  • Tries all available nodes in sequence
  • Uses proper locking during state mutation
  • Updates failover tracking metrics
  • Provides clear logging for debugging

257-290: LGTM! Past review feedback has been addressed.

The function now correctly uses strings.ToLower and strings.Contains from the standard library (lines 282-284), which handles Unicode properly and simplifies the implementation compared to custom helpers.


292-304: LGTM! Single-retry behavior is appropriate.

The method correctly detects connection errors and triggers failover. The single retry after failover is reasonable for a helper function - callers can implement additional retry logic if needed for their specific use case.


306-319: LGTM! Good use of generics for type-safe failover.

The generic implementation provides type safety while maintaining consistency with ExecuteWithFailover. The single-retry pattern is appropriate for this helper utility.


321-342: LGTM! Health check implementation is correct.

Both methods are well-implemented:

  • HealthCheck correctly releases the lock before making the RPC call
  • EnsureHealthy appropriately triggers failover on health check failures
  • Both accept context for proper cancellation handling

Comment on lines 204 to 222
func (fc *FailoverClient) GRPCConn() grpc.ClientConn {
fc.mu.RLock()
defer fc.mu.RUnlock()
return fc.wallet.Client.GRPCConn
}

// RPCClient returns the current RPC client.
func (fc *FailoverClient) RPCClient() client.Client {
fc.mu.RLock()
defer fc.mu.RUnlock()
return fc.wallet.Client.RPCClient
}

// AccAddress returns the account address.
func (fc *FailoverClient) AccAddress() string {
fc.mu.RLock()
defer fc.mu.RUnlock()
return fc.wallet.AccAddress()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider defensive nil checks for wallet access.

Methods GRPCConn(), RPCClient(), and AccAddress() dereference fc.wallet without checking for nil. While the constructors ensure wallet is initialized, adding defensive checks would improve robustness and prevent potential panics if internal state is corrupted.

🔎 Example defensive check
 func (fc *FailoverClient) GRPCConn() grpc.ClientConn {
 	fc.mu.RLock()
 	defer fc.mu.RUnlock()
+	if fc.wallet == nil {
+		return nil
+	}
 	return fc.wallet.Client.GRPCConn
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (fc *FailoverClient) GRPCConn() grpc.ClientConn {
fc.mu.RLock()
defer fc.mu.RUnlock()
return fc.wallet.Client.GRPCConn
}
// RPCClient returns the current RPC client.
func (fc *FailoverClient) RPCClient() client.Client {
fc.mu.RLock()
defer fc.mu.RUnlock()
return fc.wallet.Client.RPCClient
}
// AccAddress returns the account address.
func (fc *FailoverClient) AccAddress() string {
fc.mu.RLock()
defer fc.mu.RUnlock()
return fc.wallet.AccAddress()
}
func (fc *FailoverClient) GRPCConn() grpc.ClientConn {
fc.mu.RLock()
defer fc.mu.RUnlock()
if fc.wallet == nil {
return nil
}
return fc.wallet.Client.GRPCConn
}
// RPCClient returns the current RPC client.
func (fc *FailoverClient) RPCClient() client.Client {
fc.mu.RLock()
defer fc.mu.RUnlock()
return fc.wallet.Client.RPCClient
}
// AccAddress returns the account address.
func (fc *FailoverClient) AccAddress() string {
fc.mu.RLock()
defer fc.mu.RUnlock()
return fc.wallet.AccAddress()
}
🤖 Prompt for AI Agents
In rpc/failover.go around lines 204-222, the methods GRPCConn(), RPCClient(),
and AccAddress() access fc.wallet (and fc.wallet.Client) without nil checks; add
defensive checks inside the read lock: if fc.wallet is nil (or fc.wallet.Client
is nil for GRPCConn/RPCClient) return the appropriate zero value (nil for
GRPCConn/RPCClient, empty string for AccAddress) and optionally log a warning;
keep the RLock/RUnlock pattern and avoid panicking so callers get safe
zero-values when internal state is corrupted.

- Add ErrNoNodes error and validation for empty node lists
- Extract duplicate connection logic into connectToFirstAvailable()
- Remove redundant rpcAddrs/grpcAddrs fields (use nodeCfg directly)
- Remove unused lastFailover field
- Change "EOF" pattern to lowercase for consistency
- Simplify struct initialization
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
rpc/failover.go (4)

238-241: Add locking to NodeCount() for consistency.

NodeCount() reads nodeCfg.RPCAddrs without holding the lock, which was flagged in a previous review but remains unaddressed. While nodeCfg is only set during construction, omitting the lock is inconsistent with other accessors and could cause issues if node management features are added later.

🔎 Proposed fix
 func (fc *FailoverClient) NodeCount() int {
+	fc.mu.RLock()
+	defer fc.mu.RUnlock()
 	return len(fc.nodeCfg.RPCAddrs)
 }

116-134: Silent index reset masks invalid indices.

Lines 117-119 silently reset out-of-bounds indices to 0 instead of returning an error. This was flagged in a previous review but remains unaddressed. Silent resets can mask bugs or configuration errors where RPCAddrs and GRPCAddrs have different lengths.

🔎 Proposed fix

Return an error for invalid indices:

 func (fc *FailoverClient) createWalletAtIndex(index int) (*wallet.Wallet, error) {
-	if index >= len(fc.nodeCfg.RPCAddrs) || index >= len(fc.nodeCfg.GRPCAddrs) {
-		index = 0
+	if index < 0 || index >= len(fc.nodeCfg.RPCAddrs) || index >= len(fc.nodeCfg.GRPCAddrs) {
+		return nil, errors.New("invalid node index")
 	}

 	// Create a modified chain config with the specific node addresses

181-229: Accessor methods are correctly implemented.

All accessors use appropriate locking. While the constructors ensure wallet is initialized before returning, adding defensive nil checks in GRPCConn(), RPCClient(), and AccAddress() would improve robustness against potential future bugs.

🔎 Optional defensive checks

Example for GRPCConn():

 func (fc *FailoverClient) GRPCConn() grpc.ClientConn {
 	fc.mu.RLock()
 	defer fc.mu.RUnlock()
+	if fc.wallet == nil || fc.wallet.Client == nil {
+		return nil
+	}
 	return fc.wallet.Client.GRPCConn
 }

Apply similar patterns to RPCClient() and handle nil wallet in AccAddress().


50-90: Validate that RPCAddrs and GRPCAddrs have matching lengths.

The constructors check for empty arrays but don't verify that len(nodeCfg.RPCAddrs) == len(nodeCfg.GRPCAddrs). A length mismatch will cause silent failures when createWalletAtIndex resets invalid indices to 0, masking configuration errors. This issue was raised in a previous review but not fully addressed.

🔎 Proposed fix

Add validation in both constructors:

 func NewFailoverClient(nodeCfg NodeConfig, seed, derivation string) (*FailoverClient, error) {
 	if len(nodeCfg.RPCAddrs) == 0 || len(nodeCfg.GRPCAddrs) == 0 {
 		return nil, ErrNoNodes
 	}
+	if len(nodeCfg.RPCAddrs) != len(nodeCfg.GRPCAddrs) {
+		return nil, errors.New("RPCAddrs and GRPCAddrs must have the same length")
+	}

 	fc := &FailoverClient{

Apply the same fix to NewFailoverClientWithPrivKey:

 func NewFailoverClientWithPrivKey(nodeCfg NodeConfig, privKey string) (*FailoverClient, error) {
 	if len(nodeCfg.RPCAddrs) == 0 || len(nodeCfg.GRPCAddrs) == 0 {
 		return nil, ErrNoNodes
 	}
+	if len(nodeCfg.RPCAddrs) != len(nodeCfg.GRPCAddrs) {
+		return nil, errors.New("RPCAddrs and GRPCAddrs must have the same length")
+	}

 	fc := &FailoverClient{
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 574d4eb and 33ca05c.

📒 Files selected for processing (1)
  • rpc/failover.go
🧰 Additional context used
🧬 Code graph analysis (1)
rpc/failover.go (2)
config/types.go (1)
  • ChainConfig (17-25)
wallet/wallet.go (2)
  • CreateWalletPrivKey (39-61)
  • CreateWallet (14-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: lint
  • GitHub Check: Test
  • GitHub Check: cov
🔇 Additional comments (8)
rpc/failover.go (8)

1-15: LGTM!

Package structure and imports are clean and appropriate for the failover implementation.


17-48: LGTM!

Type definitions are well-structured. The use of sync.RWMutex is appropriate for concurrent access, and separating NodeConfig to avoid import cycles is a good design choice.


92-113: LGTM!

The connection logic properly iterates through all configured nodes and handles failures gracefully with appropriate logging.


136-179: LGTM!

The failover logic is robust: it properly cycles through all nodes, uses appropriate locking, maintains state correctly, and provides good observability through logging.


231-236: LGTM!

FailoverCount() correctly uses read locking for thread-safe access.


243-276: LGTM!

The error detection logic now correctly uses standard library functions (strings.ToLower and strings.Contains) as suggested in the previous review. The connection error patterns are comprehensive and appropriate.


278-305: LGTM!

The failover wrapper functions provide a clean pattern for automatic failover on connection errors. The single retry after failover is a reasonable design choice that balances resilience with avoiding long retry chains.


307-315: LGTM!

Health check implementation is straightforward and appropriate, using ABCI info query as a health indicator.

- Fix missing retry after failover in queue/queue.go and proofs/proofs.go
- Return ErrInvalidNodeIndex instead of silent reset in createWalletAtIndex
- Add missing RLock in NodeCount() for consistency
- Add nil checks with panic in GRPCConn/RPCClient/AccAddress methods
- Fix redundant error logging (wrap in else) in proofs/proofs.go
- Fix EnsureHealthy to verify new node health after failover
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 33ca05c and 7d28ec4.

📒 Files selected for processing (3)
  • proofs/proofs.go
  • queue/queue.go
  • rpc/failover.go
🧰 Additional context used
🧬 Code graph analysis (3)
queue/queue.go (3)
rpc/failover.go (2)
  • FailoverClient (39-54)
  • IsConnectionError (265-296)
config/types.go (1)
  • RateLimitConfig (96-99)
queue/types.go (1)
  • Queue (12-22)
proofs/proofs.go (3)
rpc/failover.go (2)
  • IsConnectionError (265-296)
  • FailoverClient (39-54)
queue/types.go (1)
  • Queue (12-22)
proofs/types.go (2)
  • FileSystem (25-29)
  • Prover (12-23)
rpc/failover.go (2)
config/types.go (1)
  • ChainConfig (17-25)
wallet/wallet.go (2)
  • CreateWalletPrivKey (39-61)
  • CreateWallet (14-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test
  • GitHub Check: lint
  • GitHub Check: cov
🔇 Additional comments (16)
queue/queue.go (3)

74-95: LGTM!

The constructor signature correctly accepts *rpc.FailoverClient and properly initializes all fields including the rate limiter.


177-190: LGTM - Past review issue addressed.

The failover-retry pattern is now correctly implemented. After a successful failover (q.wallet.Failover() returns true), the code retries the UnconfirmedTxs call. The outer if err != nil check at line 186 correctly handles both the case where failover failed and where the retry also failed.


243-251: LGTM!

The broadcast failover handling correctly detects connection errors, attempts failover, and uses continue to retry with the new connection within the retry loop. This is consistent with the mempool status handling pattern.

rpc/failover.go (9)

17-24: LGTM!

Well-defined sentinel errors for clear error handling across the codebase.


56-76: LGTM - Common initialization logic extracted.

The connectToFirstAvailable() helper method (lines 98-119) now consolidates the initialization logic, addressing the previous review suggestion. Both constructors call this shared method.


78-96: LGTM!

Constructor for legacy private key follows the same pattern as the seed-based constructor.


98-119: LGTM!

Clean implementation of the connection retry logic with proper logging. The loop tries each node sequentially and logs failures appropriately.


142-185: LGTM!

The Failover() method correctly:

  • Holds the write lock for the entire operation
  • Cycles through all nodes starting from the next one
  • Logs attempt and success/failure appropriately
  • Updates state atomically

195-226: LGTM - Nil checks added via panic.

The GRPCConn(), RPCClient(), and AccAddress() methods now include nil checks that panic with ErrNilWallet. While the comment indicates this "should not happen in normal operation," using panic ensures immediate detection of corrupted state rather than silent failures.


256-261: LGTM - RLock added.

The NodeCount() method now correctly acquires RLock before accessing nodeCfg.RPCAddrs, addressing the past review concern about consistency with other accessors.


263-296: LGTM - Standard library usage.

The IsConnectionError function now correctly uses strings.ToLower and strings.Contains from the standard library, addressing the past review suggestion.


337-354: LGTM - Health re-verification after failover added.

The EnsureHealthy method now calls HealthCheck again after a successful failover (lines 347-351), addressing the past review concern. This ensures the new node is actually healthy before returning success.

proofs/proofs.go (4)

111-111: LGTM!

Correctly uses the GRPCConn() accessor from FailoverClient to create the query client.


328-341: LGTM - Redundant logging fixed.

The previous review noted redundant error logging where both a warning and error were logged for the same connection error. This is now fixed with the nested if err != nil check at line 337, which only logs at error level if the error persists after failover attempt. The failover-retry pattern is correctly implemented.


345-358: LGTM - Same fix applied for UnconfirmedTxs.

The redundant logging issue is also fixed here. The nested if err != nil at line 354 ensures error-level logging only occurs when the error persists after a failover attempt.


451-464: LGTM!

The NewProver constructor signature correctly accepts *rpc.FailoverClient, aligning with the Prover struct definition in proofs/types.go.

Comment on lines +327 to +335
// HealthCheck performs a health check on the current node by querying ABCI info.
func (fc *FailoverClient) HealthCheck(ctx context.Context) error {
fc.mu.RLock()
rpcClient := fc.wallet.Client.RPCClient
fc.mu.RUnlock()

_, err := rpcClient.ABCIInfo(ctx)
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing nil check in HealthCheck.

Unlike GRPCConn() and RPCClient() which check for nil wallet/client, HealthCheck directly accesses fc.wallet.Client.RPCClient without validation. This inconsistency could cause a panic if internal state is corrupted.

🔎 Proposed fix
 func (fc *FailoverClient) HealthCheck(ctx context.Context) error {
 	fc.mu.RLock()
+	if fc.wallet == nil || fc.wallet.Client == nil {
+		fc.mu.RUnlock()
+		return ErrNilWallet
+	}
 	rpcClient := fc.wallet.Client.RPCClient
 	fc.mu.RUnlock()

 	_, err := rpcClient.ABCIInfo(ctx)
 	return err
 }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In rpc/failover.go around lines 327 to 335, HealthCheck reads
fc.wallet.Client.RPCClient without validating fc.wallet or fc.wallet.Client; add
nil checks while holding the read lock: verify fc.wallet != nil and
fc.wallet.Client != nil before extracting RPCClient, unlock, then if RPCClient
is nil return a clear error (e.g. "no RPC client available") instead of
dereferencing to avoid panics; ensure you keep the locking/unlocking pattern and
return the error from ABCIInfo only when rpcClient is non-nil.

- Add ErrMismatchedNodeCounts and validate RPCAddrs/GRPCAddrs have equal length
- Add failover handling with retry to SpaceHandler in api/client.go
- Fix error encoding to not shadow err variable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants