-
Notifications
You must be signed in to change notification settings - Fork 2
Add failover support for multiple RPC nodes #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implement automatic failover capability for multiple RPC/GRPC nodes:
- Add RPCAddrs/GRPCAddrs fields to ChainConfig with backwards compatibility
- Create new rpc/failover.go package with FailoverClient wrapper
- FailoverClient automatically switches to next available node on connection errors
- Add IsConnectionError() helper to detect connection failures
- Update all components to use FailoverClient instead of direct wallet
The failover state is kept in memory only. When a connection error is
detected, the system automatically attempts to connect to the next
configured node in the list.
Configuration example:
chain:
rpc_addrs:
- http://node1:26657
- http://node2:26657
grpc_addrs:
- node1:9090
- node2:9090
📝 WalkthroughWalkthroughThe diff replaces direct usage of the local wallet type with a new rpc.FailoverClient across the codebase. Function signatures, struct fields, and handler registrations were updated to accept or store *rpc.FailoverClient and to call its accessors (Wallet(), GRPCConn(), RPCClient(), AccAddress()). A new rpc/failover.go implements NodeConfig and FailoverClient with health checks, connection/error detection, and automatic failover across RPC/GRPC nodes, plus generic QueryWithFailover/ExecuteWithFailover helpers. Config types gained RPCAddrs/GRPCAddrs and accessor methods. Changes propagate failover-aware behavior through api, core, proofs, queue, monitoring, strays, cmd, and related packages, including added connection-error detection and retry paths around chain queries and broadcasts. Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (2)
🧰 Additional context used🧬 Code graph analysis (2)api/client.go (2)
rpc/failover.go (2)
🔇 Additional comments (9)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (10)
config/types.go (1)
171-179: Consider updatingDefaultChainConfigto also populate the array fields.The default config only sets
RPCAddrandGRPCAddr. While the accessors handle this correctly via fallback logic, settingRPCAddrsandGRPCAddrsexplicitly would make the defaults more self-documenting and reduce reliance on the fallback path.🔎 Proposed change
func DefaultChainConfig() ChainConfig { return ChainConfig{ RPCAddr: "http://localhost:26657", GRPCAddr: "localhost:9090", + RPCAddrs: []string{"http://localhost:26657"}, + GRPCAddrs: []string{"localhost:9090"}, GasPrice: "0.02ujkl", GasAdjustment: 1.5, Bech32Prefix: "jkl", } }cmd/wallet/wallet.go (1)
108-128: Inconsistent variable naming and missing failover handling.
- The variable is named
walletbut it's a*rpc.FailoverClient. Consider renaming tofcfor consistency withbalanceCMD.- The broadcast operation doesn't use
ExecuteWithFailover, so connection errors during withdrawal won't trigger automatic failover.🔎 Proposed fix
- wallet, err := config.InitWallet(home) + fc, err := config.InitWallet(home) if err != nil { return err } // ... m := bankTypes.MsgSend{ - FromAddress: wallet.AccAddress(), + FromAddress: fc.AccAddress(), ToAddress: args[0], Amount: sdk.NewCoins(c), } // ... - res, err := wallet.Wallet().BroadcastTxCommit(data) + var res *sdk.TxResponse + err = fc.ExecuteWithFailover(func() error { + var txErr error + res, txErr = fc.Wallet().BroadcastTxCommit(data) + return txErr + })monitoring/monitor.go (1)
29-37: Consider adding failover handling in monitoring methods.The monitoring methods (
updateBurns,updateHeight,updateBalance) silently return on errors without triggering failover. Since these run continuously, connection errors could cause prolonged monitoring gaps without attempting recovery.🔎 Example for updateHeight
func (m *Monitor) updateHeight() { abciInfo, err := m.wallet.RPCClient().ABCIInfo(context.Background()) if err != nil { + if rpc.IsConnectionError(err) { + m.wallet.Failover() + } return } height := abciInfo.Response.LastBlockHeightNote: This requires importing
"github.com/JackalLabs/sequoia/rpc".proofs/proofs.go (1)
111-116: Consider adding failover handling for gRPC query errors.
GenerateProofqueries the file via gRPC but doesn't trigger failover on connection errors. This could leave the prover stuck on a failed node during proof generation.🔎 Proposed fix
res, err := cl.File(context.Background(), queryParams) if err != nil { + if rpc.IsConnectionError(err) { + p.wallet.Failover() + } return nil, nil, 0, err }api/index.go (2)
29-49: Consider adding failover handling for connection errors.The
VersionHandlerdoesn't attempt failover whenGetChainID()fails. While this is a read-only status endpoint, adding failover would improve resilience and consistency with other parts of the codebase.🔎 Proposed enhancement
func VersionHandler(fc *rpc.FailoverClient) func(http.ResponseWriter, *http.Request) { return func(w http.ResponseWriter, req *http.Request) { chainId, err := fc.Wallet().Client.GetChainID() if err != nil { + if rpc.IsConnectionError(err) { + if fc.Failover() { + chainId, err = fc.Wallet().Client.GetChainID() + } + } + } + if err != nil { w.WriteHeader(500) return }
51-72: Consider adding failover handling for the RPC status check.Similar to
VersionHandler, theNetworkHandlerdoesn't attempt failover whenfc.RPCClient().Status()fails. Adding failover would be consistent with the rest of the codebase.core/app.go (3)
121-147: Missing failover handling forBuildTxerrors.
BuildTx(line 129) may fail due to connection issues when querying account information, but onlyBroadcastTxCommiterrors trigger failover. Consider adding failover handling forBuildTxfailures as well.🔎 Proposed fix
builder, err := w.BuildTx(data) if err != nil { + if rpc.IsConnectionError(err) { + if fc.Failover() { + return initProviderOnChain(fc, ip, totalSpace) + } + } return err }
149-203: SameBuildTxfailover gap inupdateSpaceandupdateIp.Both functions have the same pattern where
BuildTxerrors don't trigger failover. Consider applying the same fix as suggested forinitProviderOnChain.
327-336: Optional: Consider failover for IPFS peer discovery.
ConnectPeersdoesn't attempt failover whenActiveProvidersquery fails. Since this is a background task, the current behavior (log and return) is acceptable, but adding failover could improve resilience.api/file_handler.go (1)
88-92: Consider adding failover for chain query failures in file upload.When
cl.File()fails at line 88, there's no failover attempt. Adding failover handling could improve upload reliability when a node becomes unavailable mid-request.🔎 Proposed fix
res, err := cl.File(context.Background(), &queryParams) if err != nil { + if rpc.IsConnectionError(err) { + if fc.Failover() { + cl = storageTypes.NewQueryClient(fc.GRPCConn()) + res, err = cl.File(context.Background(), &queryParams) + } + } + } + if err != nil { handleErr(fmt.Errorf("failed to find file on chain with merkle: %x, owner: %s, start: %d | %w", merkle, sender, startBlock, err), w, http.StatusInternalServerError) return }
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (18)
api/client.goapi/file_handler.goapi/index.goapi/server.gocmd/wallet/wallet.goconfig/types.goconfig/wallet.gocore/app.gomonitoring/monitor.gomonitoring/types.goproofs/proofs.goproofs/types.goqueue/queue.goqueue/types.gorpc/failover.gostrays/hands.gostrays/manager.gostrays/types.go
🧰 Additional context used
🧬 Code graph analysis (14)
proofs/types.go (1)
rpc/failover.go (1)
FailoverClient(29-47)
monitoring/types.go (1)
rpc/failover.go (1)
FailoverClient(29-47)
api/index.go (1)
rpc/failover.go (1)
FailoverClient(29-47)
api/client.go (1)
rpc/failover.go (1)
FailoverClient(29-47)
cmd/wallet/wallet.go (1)
config/wallet.go (1)
InitWallet(80-123)
api/server.go (4)
proofs/types.go (2)
FileSystem(25-29)Prover(12-23)rpc/failover.go (1)
FailoverClient(29-47)api/index.go (1)
VersionHandler(29-49)api/client.go (1)
SpaceHandler(14-71)
strays/types.go (1)
rpc/failover.go (1)
FailoverClient(29-47)
queue/types.go (1)
rpc/failover.go (1)
FailoverClient(29-47)
config/wallet.go (1)
rpc/failover.go (4)
FailoverClient(29-47)NodeConfig(18-24)NewFailoverClientWithPrivKey(90-125)NewFailoverClient(51-87)
strays/manager.go (3)
rpc/failover.go (2)
FailoverClient(29-47)IsConnectionError(258-288)queue/types.go (1)
Queue(12-22)strays/types.go (1)
StrayManager(19-30)
proofs/proofs.go (3)
rpc/failover.go (2)
IsConnectionError(258-288)FailoverClient(29-47)queue/types.go (1)
Queue(12-22)proofs/types.go (2)
FileSystem(25-29)Prover(12-23)
api/file_handler.go (2)
proofs/types.go (2)
FileSystem(25-29)Prover(12-23)rpc/failover.go (1)
FailoverClient(29-47)
core/app.go (1)
rpc/failover.go (2)
FailoverClient(29-47)IsConnectionError(258-288)
rpc/failover.go (2)
config/types.go (1)
ChainConfig(17-25)wallet/wallet.go (2)
CreateWalletPrivKey(39-61)CreateWallet(14-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: lint
- GitHub Check: Build
- GitHub Check: Test
- GitHub Check: cov
🔇 Additional comments (25)
config/types.go (2)
27-38: LGTM! Backward-compatible accessor with sensible defaults.The
GetRPCAddrs()accessor correctly prioritizes the multi-address array over the single address field, maintaining backward compatibility for existing configurations.
40-51: LGTM! Consistent implementation withGetRPCAddrs().The
GetGRPCAddrs()accessor follows the same pattern asGetRPCAddrs(), ensuring consistency across RPC and gRPC address handling.rpc/failover.go (2)
318-345: Failover helpers retry only once after failover.
ExecuteWithFailoverandQueryWithFailoverattempt failover once and then retry the operation once. If the new node also fails immediately, the error is returned without trying additional nodes. Consider whether a loop through all nodes would be more resilient.Additionally, after a failed failover (when
Failover()returns false), the original error is still returned, which is correct. However, it may be worth logging that all nodes were exhausted.
347-368: LGTM! Health check implementation is straightforward.
HealthCheckandEnsureHealthyprovide a clean mechanism to proactively verify node health and trigger failover when needed.cmd/wallet/wallet.go (1)
68-77: LGTM! Correctly uses FailoverClient accessors.The balance command properly obtains the gRPC connection and account address through the
FailoverClientinterface.proofs/types.go (1)
12-23: LGTM! Type change aligns with FailoverClient migration.The
Proverstruct correctly updated to use*rpc.FailoverClient. The field namewalletstill communicates the purpose, thoughfcorclientwould be more accurate given the type change.strays/types.go (1)
12-17: LGTM! Appropriate separation of concerns.
StrayManageruses*rpc.FailoverClientfor failover-aware operations, whileHandinstances retain*wallet.Walletsince they receive cloned wallet instances. This design correctly separates failover management from per-hand wallet operations.Also applies to: 19-30
strays/hands.go (1)
115-131: LGTM! Correct accessor chain for wallet cloning.The change to
s.wallet.Wallet().CloneWalletOffset(offset)correctly adapts to theFailoverClientabstraction by first retrieving the underlying wallet before invoking the clone method.queue/types.go (1)
12-22: Connection error handling properly implemented in queue processing logic.The
Queuestruct correctly updated to use*rpc.FailoverClient. Queue operations inqueue/queue.goproperly handle connection errors:RPCClient().UnconfirmedTxs()(line 177) andWallet().BroadcastTxSync()(line 238) both checkrpc.IsConnectionError()and triggerFailover()on connection failures, enabling automatic recovery from RPC node failures.monitoring/types.go (1)
31-41: LGTM!The type migration from
*wallet.Walletto*rpc.FailoverClientis clean and consistent with the broader refactoring. The constructor correctly initializes the new field type.api/server.go (1)
50-84: LGTM!The
Servemethod signature and all route registrations are correctly updated to use*rpc.FailoverClient. The handlers consistently receive the failover client for RPC/gRPC access.proofs/proofs.go (1)
441-453: LGTM!The
NewProverconstructor is correctly updated to accept*rpc.FailoverClientand properly initializes the prover with the failover-enabled wallet.strays/manager.go (2)
186-190: LGTM on failover handling!The connection error detection and failover triggering in
RefreshListis well implemented, consistent with the pattern used in other files.
29-98: LGTM!The
NewStrayManagerconstructor is correctly updated to accept*rpc.FailoverClientand properly usesw.AccAddress()for the claimer message construction.config/wallet.go (2)
93-104: LGTM!The
NodeConfigconstruction properly extracts configuration values including the new multi-address fields viaGetRPCAddrs()andGetGRPCAddrs(). Both methods correctly handle backwards compatibility by checking the new multi-address fields first, then falling back to converting single-address fields into slice format ([]string{c.RPCAddr}and[]string{c.GRPCAddr}), with sensible defaults for each protocol.
80-122: Signature change is a breaking API change.The return type changed from
*wallet.Walletto*rpc.FailoverClient. All callers in the codebase have been updated appropriately to use the new return type and call the correct methods (AccAddress(),GRPCConn()).queue/queue.go (3)
15-15: LGTM!Import addition for the new
rpcpackage to support FailoverClient integration.
74-95: LGTM!Clean signature change to accept
*rpc.FailoverClient. The queue initialization logic remains intact.
236-270: LGTM!The broadcast failover handling is well-implemented:
- Gets fresh wallet reference inside the loop via
q.wallet.Wallet(), ensuring retries use the new connection after failover- Properly continues the retry loop when
Failover()succeeds- Maintains existing error handling for cache duplicates, full mempool, and sequence mismatches
core/app.go (1)
51-51: LGTM!Field type change from
*wallet.Walletto*rpc.FailoverClientaligns with the failover architecture.api/file_handler.go (5)
47-47: LGTM!Clean migration to
FailoverClientforPostFileHandler. The function correctly usesfc.GRPCConn()for the query client.Also applies to: 82-82
135-135: LGTM!
PostFileHandlerV2properly usesfc.GRPCConn()for the query client. Same clean migration pattern asPostFileHandler.Also applies to: 189-189
415-467: LGTM!
getMerkleDataproperly propagatesfcfor gRPC queries and network file fetching. The fallback to other providers when local data isn't available is a good resilience pattern.
472-519: LGTM!
GetMerklePathDatacorrectly passesfcthrough recursive calls, ensuring consistent access to the FailoverClient throughout the path traversal.
524-596: LGTM!
FindFileHandlerproperly threads thefcparameter through toGetMerklePathDataandgetMerkleDatacalls.
| params, err := a.GetStorageParams(a.wallet.GRPCConn()) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Missing failover handling for GetStorageParams.
If GetStorageParams fails due to a connection error, there's no failover attempt. Consider adding connection error handling here.
🔎 Proposed fix
params, err := a.GetStorageParams(a.wallet.GRPCConn())
if err != nil {
+ if rpc.IsConnectionError(err) {
+ if a.wallet.Failover() {
+ params, err = a.GetStorageParams(a.wallet.GRPCConn())
+ }
+ }
+ }
+ if err != nil {
return err
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In core/app.go around lines 277 to 280, calling
a.GetStorageParams(a.wallet.GRPCConn()) has no failover if the gRPC call fails;
add connection-error handling by detecting transient/connection errors,
attempting to re-establish the wallet gRPC connection (e.g. call the wallet
reconnect/init routine or recreate the GRPCConn), and retrying GetStorageParams
a few times with exponential backoff and logging each attempt; if all retries
fail, return the final error. Ensure retry count and backoff are configurable
and only retry on connection/transient errors, not on permanent parameter
errors.
| go hand.Start(f, s.wallet.Wallet(), q, myUrl, chunkSize) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find Hand.Start implementation to see how the wallet parameter is used
ast-grep --pattern $'func ($_ *Hand) Start($$$) {
$$$
}'Repository: JackalLabs/sequoia
Length of output: 3086
🏁 Script executed:
# Find network.DownloadFile signature and implementation
ast-grep --pattern $'func DownloadFile($$$) {
$$$
}'Repository: JackalLabs/sequoia
Length of output: 44
🏁 Script executed:
# Search for DownloadFile definition
rg "func DownloadFile" --type go -A 10Repository: JackalLabs/sequoia
Length of output: 2096
🏁 Script executed:
# Check the manager.go file to see how FailoverClient is used
cat -n strays/manager.go | head -50Repository: JackalLabs/sequoia
Length of output: 2083
🏁 Script executed:
# Check the area around line 105 in manager.go to see the hand.Start call
sed -n '100,120p' strays/manager.goRepository: JackalLabs/sequoia
Length of output: 601
🏁 Script executed:
# Check the FailoverClient structure in rpc package
find . -name "*.go" -path "*/rpc/*" | head -5Repository: JackalLabs/sequoia
Length of output: 79
🏁 Script executed:
# Check the FailoverClient structure and Wallet() method
cat -n rpc/failover.go | head -80Repository: JackalLabs/sequoia
Length of output: 2661
🏁 Script executed:
# Check more of the FailoverClient, particularly the Wallet() method
cat -n rpc/failover.go | sed -n '80,150p'Repository: JackalLabs/sequoia
Length of output: 2670
🏁 Script executed:
# Find the Wallet() method in FailoverClient
rg "func.*Wallet\(\)" rpc/failover.go -A 5Repository: JackalLabs/sequoia
Length of output: 172
🏁 Script executed:
# Check all public methods of FailoverClient
rg "^func \(fc \*FailoverClient\)" rpc/failover.go | head -20Repository: JackalLabs/sequoia
Length of output: 852
🏁 Script executed:
# Check the GRPCConn() method of FailoverClient
rg "func.*GRPCConn" rpc/failover.go -A 5Repository: JackalLabs/sequoia
Length of output: 191
🏁 Script executed:
# Check the RefreshList method to see how it uses FailoverClient
rg "func.*RefreshList" strays/manager.go -A 30Repository: JackalLabs/sequoia
Length of output: 863
Hands lack failover capability for network operations.
hand.Start() receives s.wallet.Wallet() (the underlying *wallet.Wallet), not the FailoverClient. This means when network.DownloadFile() executes a query and encounters a connection error, there is no failover mechanism to retry with an alternative node—unlike RefreshList() which explicitly checks for connection errors and calls s.wallet.Failover().
This creates an inconsistency where the manager handles failover for its own queries but hands do not. Consider passing the FailoverClient to hands instead, or implement explicit failover handling in DownloadFile().
- Replace custom contains/toLowerCase functions with standard library strings.Contains and strings.ToLower for better reliability - Simplify IsConnectionError logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
rpc/failover.go (4)
50-88: Add validation for address slice lengths and non-empty requirement.The constructor doesn't validate that
RPCAddrsandGRPCAddrshave the same length or that at least one address pair is provided. This can lead to runtime panics increateWalletAtIndexor silent failures when lengths mismatch.🔎 Proposed fix
func NewFailoverClient(nodeCfg NodeConfig, seed, derivation string) (*FailoverClient, error) { + if len(nodeCfg.RPCAddrs) != len(nodeCfg.GRPCAddrs) { + return nil, fmt.Errorf("RPCAddrs and GRPCAddrs must have the same length: got %d and %d", + len(nodeCfg.RPCAddrs), len(nodeCfg.GRPCAddrs)) + } + if len(nodeCfg.RPCAddrs) == 0 { + return nil, fmt.Errorf("at least one RPC/GRPC address pair is required") + } + fc := &FailoverClient{Based on past review comments.
50-126: Extract common initialization logic to reduce duplication.
NewFailoverClientandNewFailoverClientWithPrivKeyshare nearly identical retry logic (lines 64-78 and 102-116). Consider extracting the connection-retry loop into a private helper method to improve maintainability.🔎 Proposed refactor
+// initializeConnection attempts to connect to nodes, trying each in sequence until successful. +func (fc *FailoverClient) initializeConnection() error { + w, err := fc.createWalletAtIndex(0) + if err != nil { + for i := 1; i < len(fc.rpcAddrs); i++ { + w, err = fc.createWalletAtIndex(i) + if err == nil { + fc.currentIndex = i + break + } + log.Warn().Err(err).Int("index", i).Msg("Failed to connect to node, trying next") + } + if err != nil { + return err + } + } + fc.wallet = w + log.Info(). + Int("node_index", fc.currentIndex). + Str("rpc", fc.rpcAddrs[fc.currentIndex]). + Str("grpc", fc.grpcAddrs[fc.currentIndex]). + Msg("Connected to blockchain node") + return nil +}Then simplify both constructors:
func NewFailoverClient(nodeCfg NodeConfig, seed, derivation string) (*FailoverClient, error) { fc := &FailoverClient{ nodeCfg: nodeCfg, seed: seed, derivation: derivation, useLegacyKey: false, rpcAddrs: nodeCfg.RPCAddrs, grpcAddrs: nodeCfg.GRPCAddrs, currentIndex: 0, } - // Create initial wallet connection - w, err := fc.createWalletAtIndex(0) - if err != nil { - // Try other nodes if the first one fails - for i := 1; i < len(fc.rpcAddrs); i++ { - w, err = fc.createWalletAtIndex(i) - if err == nil { - fc.currentIndex = i - break - } - log.Warn().Err(err).Int("index", i).Msg("Failed to connect to node, trying next") - } - if err != nil { - return nil, err - } - } - fc.wallet = w - log.Info(). - Int("node_index", fc.currentIndex). - Str("rpc", fc.rpcAddrs[fc.currentIndex]). - Str("grpc", fc.grpcAddrs[fc.currentIndex]). - Msg("Connected to blockchain node") - return fc, nil + if err := fc.initializeConnection(); err != nil { + return nil, err + } + return fc, nil }Based on past review comments.
128-147: Silent index reset may mask configuration errors.When
index >= len(fc.rpcAddrs)orindex >= len(fc.grpcAddrs), the function silently resets to index 0 (lines 130-132). This could hide misconfiguration whererpcAddrsandgrpcAddrshave different lengths. With the validation added in constructors, this should return an error for invalid indices instead.🔎 Proposed fix
func (fc *FailoverClient) createWalletAtIndex(index int) (*wallet.Wallet, error) { - if index >= len(fc.rpcAddrs) || index >= len(fc.grpcAddrs) { - index = 0 + if index < 0 || index >= len(fc.rpcAddrs) || index >= len(fc.grpcAddrs) { + return nil, fmt.Errorf("invalid node index: %d", index) }Based on past review comments.
252-255:NodeCount()readsrpcAddrswithout holding the lock.While
rpcAddrsis only written during construction, omitting the lock is inconsistent with other accessors and could cause issues if node addition/removal is added later. For consistency and future-proofing, acquire the read lock.🔎 Proposed fix
func (fc *FailoverClient) NodeCount() int { + fc.mu.RLock() + defer fc.mu.RUnlock() return len(fc.rpcAddrs) }Based on past review comments.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
rpc/failover.go
🧰 Additional context used
🧬 Code graph analysis (1)
rpc/failover.go (2)
config/types.go (1)
ChainConfig(17-25)wallet/wallet.go (2)
CreateWalletPrivKey(39-61)CreateWallet(14-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test
- GitHub Check: cov
🔇 Additional comments (5)
rpc/failover.go (5)
149-193: LGTM! Failover logic is well-implemented.The method correctly:
- Tries all available nodes in sequence
- Uses proper locking during state mutation
- Updates failover tracking metrics
- Provides clear logging for debugging
257-290: LGTM! Past review feedback has been addressed.The function now correctly uses
strings.ToLowerandstrings.Containsfrom the standard library (lines 282-284), which handles Unicode properly and simplifies the implementation compared to custom helpers.
292-304: LGTM! Single-retry behavior is appropriate.The method correctly detects connection errors and triggers failover. The single retry after failover is reasonable for a helper function - callers can implement additional retry logic if needed for their specific use case.
306-319: LGTM! Good use of generics for type-safe failover.The generic implementation provides type safety while maintaining consistency with
ExecuteWithFailover. The single-retry pattern is appropriate for this helper utility.
321-342: LGTM! Health check implementation is correct.Both methods are well-implemented:
HealthCheckcorrectly releases the lock before making the RPC callEnsureHealthyappropriately triggers failover on health check failures- Both accept context for proper cancellation handling
| func (fc *FailoverClient) GRPCConn() grpc.ClientConn { | ||
| fc.mu.RLock() | ||
| defer fc.mu.RUnlock() | ||
| return fc.wallet.Client.GRPCConn | ||
| } | ||
|
|
||
| // RPCClient returns the current RPC client. | ||
| func (fc *FailoverClient) RPCClient() client.Client { | ||
| fc.mu.RLock() | ||
| defer fc.mu.RUnlock() | ||
| return fc.wallet.Client.RPCClient | ||
| } | ||
|
|
||
| // AccAddress returns the account address. | ||
| func (fc *FailoverClient) AccAddress() string { | ||
| fc.mu.RLock() | ||
| defer fc.mu.RUnlock() | ||
| return fc.wallet.AccAddress() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider defensive nil checks for wallet access.
Methods GRPCConn(), RPCClient(), and AccAddress() dereference fc.wallet without checking for nil. While the constructors ensure wallet is initialized, adding defensive checks would improve robustness and prevent potential panics if internal state is corrupted.
🔎 Example defensive check
func (fc *FailoverClient) GRPCConn() grpc.ClientConn {
fc.mu.RLock()
defer fc.mu.RUnlock()
+ if fc.wallet == nil {
+ return nil
+ }
return fc.wallet.Client.GRPCConn
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (fc *FailoverClient) GRPCConn() grpc.ClientConn { | |
| fc.mu.RLock() | |
| defer fc.mu.RUnlock() | |
| return fc.wallet.Client.GRPCConn | |
| } | |
| // RPCClient returns the current RPC client. | |
| func (fc *FailoverClient) RPCClient() client.Client { | |
| fc.mu.RLock() | |
| defer fc.mu.RUnlock() | |
| return fc.wallet.Client.RPCClient | |
| } | |
| // AccAddress returns the account address. | |
| func (fc *FailoverClient) AccAddress() string { | |
| fc.mu.RLock() | |
| defer fc.mu.RUnlock() | |
| return fc.wallet.AccAddress() | |
| } | |
| func (fc *FailoverClient) GRPCConn() grpc.ClientConn { | |
| fc.mu.RLock() | |
| defer fc.mu.RUnlock() | |
| if fc.wallet == nil { | |
| return nil | |
| } | |
| return fc.wallet.Client.GRPCConn | |
| } | |
| // RPCClient returns the current RPC client. | |
| func (fc *FailoverClient) RPCClient() client.Client { | |
| fc.mu.RLock() | |
| defer fc.mu.RUnlock() | |
| return fc.wallet.Client.RPCClient | |
| } | |
| // AccAddress returns the account address. | |
| func (fc *FailoverClient) AccAddress() string { | |
| fc.mu.RLock() | |
| defer fc.mu.RUnlock() | |
| return fc.wallet.AccAddress() | |
| } |
🤖 Prompt for AI Agents
In rpc/failover.go around lines 204-222, the methods GRPCConn(), RPCClient(),
and AccAddress() access fc.wallet (and fc.wallet.Client) without nil checks; add
defensive checks inside the read lock: if fc.wallet is nil (or fc.wallet.Client
is nil for GRPCConn/RPCClient) return the appropriate zero value (nil for
GRPCConn/RPCClient, empty string for AccAddress) and optionally log a warning;
keep the RLock/RUnlock pattern and avoid panicking so callers get safe
zero-values when internal state is corrupted.
- Add ErrNoNodes error and validation for empty node lists - Extract duplicate connection logic into connectToFirstAvailable() - Remove redundant rpcAddrs/grpcAddrs fields (use nodeCfg directly) - Remove unused lastFailover field - Change "EOF" pattern to lowercase for consistency - Simplify struct initialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
rpc/failover.go (4)
238-241: Add locking to NodeCount() for consistency.
NodeCount()readsnodeCfg.RPCAddrswithout holding the lock, which was flagged in a previous review but remains unaddressed. WhilenodeCfgis only set during construction, omitting the lock is inconsistent with other accessors and could cause issues if node management features are added later.🔎 Proposed fix
func (fc *FailoverClient) NodeCount() int { + fc.mu.RLock() + defer fc.mu.RUnlock() return len(fc.nodeCfg.RPCAddrs) }
116-134: Silent index reset masks invalid indices.Lines 117-119 silently reset out-of-bounds indices to 0 instead of returning an error. This was flagged in a previous review but remains unaddressed. Silent resets can mask bugs or configuration errors where
RPCAddrsandGRPCAddrshave different lengths.🔎 Proposed fix
Return an error for invalid indices:
func (fc *FailoverClient) createWalletAtIndex(index int) (*wallet.Wallet, error) { - if index >= len(fc.nodeCfg.RPCAddrs) || index >= len(fc.nodeCfg.GRPCAddrs) { - index = 0 + if index < 0 || index >= len(fc.nodeCfg.RPCAddrs) || index >= len(fc.nodeCfg.GRPCAddrs) { + return nil, errors.New("invalid node index") } // Create a modified chain config with the specific node addresses
181-229: Accessor methods are correctly implemented.All accessors use appropriate locking. While the constructors ensure
walletis initialized before returning, adding defensive nil checks inGRPCConn(),RPCClient(), andAccAddress()would improve robustness against potential future bugs.🔎 Optional defensive checks
Example for
GRPCConn():func (fc *FailoverClient) GRPCConn() grpc.ClientConn { fc.mu.RLock() defer fc.mu.RUnlock() + if fc.wallet == nil || fc.wallet.Client == nil { + return nil + } return fc.wallet.Client.GRPCConn }Apply similar patterns to
RPCClient()and handle nil wallet inAccAddress().
50-90: Validate that RPCAddrs and GRPCAddrs have matching lengths.The constructors check for empty arrays but don't verify that
len(nodeCfg.RPCAddrs) == len(nodeCfg.GRPCAddrs). A length mismatch will cause silent failures whencreateWalletAtIndexresets invalid indices to 0, masking configuration errors. This issue was raised in a previous review but not fully addressed.🔎 Proposed fix
Add validation in both constructors:
func NewFailoverClient(nodeCfg NodeConfig, seed, derivation string) (*FailoverClient, error) { if len(nodeCfg.RPCAddrs) == 0 || len(nodeCfg.GRPCAddrs) == 0 { return nil, ErrNoNodes } + if len(nodeCfg.RPCAddrs) != len(nodeCfg.GRPCAddrs) { + return nil, errors.New("RPCAddrs and GRPCAddrs must have the same length") + } fc := &FailoverClient{Apply the same fix to
NewFailoverClientWithPrivKey:func NewFailoverClientWithPrivKey(nodeCfg NodeConfig, privKey string) (*FailoverClient, error) { if len(nodeCfg.RPCAddrs) == 0 || len(nodeCfg.GRPCAddrs) == 0 { return nil, ErrNoNodes } + if len(nodeCfg.RPCAddrs) != len(nodeCfg.GRPCAddrs) { + return nil, errors.New("RPCAddrs and GRPCAddrs must have the same length") + } fc := &FailoverClient{
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
rpc/failover.go
🧰 Additional context used
🧬 Code graph analysis (1)
rpc/failover.go (2)
config/types.go (1)
ChainConfig(17-25)wallet/wallet.go (2)
CreateWalletPrivKey(39-61)CreateWallet(14-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint
- GitHub Check: Test
- GitHub Check: cov
🔇 Additional comments (8)
rpc/failover.go (8)
1-15: LGTM!Package structure and imports are clean and appropriate for the failover implementation.
17-48: LGTM!Type definitions are well-structured. The use of
sync.RWMutexis appropriate for concurrent access, and separatingNodeConfigto avoid import cycles is a good design choice.
92-113: LGTM!The connection logic properly iterates through all configured nodes and handles failures gracefully with appropriate logging.
136-179: LGTM!The failover logic is robust: it properly cycles through all nodes, uses appropriate locking, maintains state correctly, and provides good observability through logging.
231-236: LGTM!
FailoverCount()correctly uses read locking for thread-safe access.
243-276: LGTM!The error detection logic now correctly uses standard library functions (
strings.ToLowerandstrings.Contains) as suggested in the previous review. The connection error patterns are comprehensive and appropriate.
278-305: LGTM!The failover wrapper functions provide a clean pattern for automatic failover on connection errors. The single retry after failover is a reasonable design choice that balances resilience with avoiding long retry chains.
307-315: LGTM!Health check implementation is straightforward and appropriate, using ABCI info query as a health indicator.
- Fix missing retry after failover in queue/queue.go and proofs/proofs.go - Return ErrInvalidNodeIndex instead of silent reset in createWalletAtIndex - Add missing RLock in NodeCount() for consistency - Add nil checks with panic in GRPCConn/RPCClient/AccAddress methods - Fix redundant error logging (wrap in else) in proofs/proofs.go - Fix EnsureHealthy to verify new node health after failover
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
proofs/proofs.goqueue/queue.gorpc/failover.go
🧰 Additional context used
🧬 Code graph analysis (3)
queue/queue.go (3)
rpc/failover.go (2)
FailoverClient(39-54)IsConnectionError(265-296)config/types.go (1)
RateLimitConfig(96-99)queue/types.go (1)
Queue(12-22)
proofs/proofs.go (3)
rpc/failover.go (2)
IsConnectionError(265-296)FailoverClient(39-54)queue/types.go (1)
Queue(12-22)proofs/types.go (2)
FileSystem(25-29)Prover(12-23)
rpc/failover.go (2)
config/types.go (1)
ChainConfig(17-25)wallet/wallet.go (2)
CreateWalletPrivKey(39-61)CreateWallet(14-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test
- GitHub Check: lint
- GitHub Check: cov
🔇 Additional comments (16)
queue/queue.go (3)
74-95: LGTM!The constructor signature correctly accepts
*rpc.FailoverClientand properly initializes all fields including the rate limiter.
177-190: LGTM - Past review issue addressed.The failover-retry pattern is now correctly implemented. After a successful failover (
q.wallet.Failover()returnstrue), the code retries theUnconfirmedTxscall. The outerif err != nilcheck at line 186 correctly handles both the case where failover failed and where the retry also failed.
243-251: LGTM!The broadcast failover handling correctly detects connection errors, attempts failover, and uses
continueto retry with the new connection within the retry loop. This is consistent with the mempool status handling pattern.rpc/failover.go (9)
17-24: LGTM!Well-defined sentinel errors for clear error handling across the codebase.
56-76: LGTM - Common initialization logic extracted.The
connectToFirstAvailable()helper method (lines 98-119) now consolidates the initialization logic, addressing the previous review suggestion. Both constructors call this shared method.
78-96: LGTM!Constructor for legacy private key follows the same pattern as the seed-based constructor.
98-119: LGTM!Clean implementation of the connection retry logic with proper logging. The loop tries each node sequentially and logs failures appropriately.
142-185: LGTM!The
Failover()method correctly:
- Holds the write lock for the entire operation
- Cycles through all nodes starting from the next one
- Logs attempt and success/failure appropriately
- Updates state atomically
195-226: LGTM - Nil checks added via panic.The
GRPCConn(),RPCClient(), andAccAddress()methods now include nil checks that panic withErrNilWallet. While the comment indicates this "should not happen in normal operation," using panic ensures immediate detection of corrupted state rather than silent failures.
256-261: LGTM - RLock added.The
NodeCount()method now correctly acquiresRLockbefore accessingnodeCfg.RPCAddrs, addressing the past review concern about consistency with other accessors.
263-296: LGTM - Standard library usage.The
IsConnectionErrorfunction now correctly usesstrings.ToLowerandstrings.Containsfrom the standard library, addressing the past review suggestion.
337-354: LGTM - Health re-verification after failover added.The
EnsureHealthymethod now callsHealthCheckagain after a successful failover (lines 347-351), addressing the past review concern. This ensures the new node is actually healthy before returning success.proofs/proofs.go (4)
111-111: LGTM!Correctly uses the
GRPCConn()accessor fromFailoverClientto create the query client.
328-341: LGTM - Redundant logging fixed.The previous review noted redundant error logging where both a warning and error were logged for the same connection error. This is now fixed with the nested
if err != nilcheck at line 337, which only logs at error level if the error persists after failover attempt. The failover-retry pattern is correctly implemented.
345-358: LGTM - Same fix applied for UnconfirmedTxs.The redundant logging issue is also fixed here. The nested
if err != nilat line 354 ensures error-level logging only occurs when the error persists after a failover attempt.
451-464: LGTM!The
NewProverconstructor signature correctly accepts*rpc.FailoverClient, aligning with theProverstruct definition inproofs/types.go.
| // HealthCheck performs a health check on the current node by querying ABCI info. | ||
| func (fc *FailoverClient) HealthCheck(ctx context.Context) error { | ||
| fc.mu.RLock() | ||
| rpcClient := fc.wallet.Client.RPCClient | ||
| fc.mu.RUnlock() | ||
|
|
||
| _, err := rpcClient.ABCIInfo(ctx) | ||
| return err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing nil check in HealthCheck.
Unlike GRPCConn() and RPCClient() which check for nil wallet/client, HealthCheck directly accesses fc.wallet.Client.RPCClient without validation. This inconsistency could cause a panic if internal state is corrupted.
🔎 Proposed fix
func (fc *FailoverClient) HealthCheck(ctx context.Context) error {
fc.mu.RLock()
+ if fc.wallet == nil || fc.wallet.Client == nil {
+ fc.mu.RUnlock()
+ return ErrNilWallet
+ }
rpcClient := fc.wallet.Client.RPCClient
fc.mu.RUnlock()
_, err := rpcClient.ABCIInfo(ctx)
return err
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In rpc/failover.go around lines 327 to 335, HealthCheck reads
fc.wallet.Client.RPCClient without validating fc.wallet or fc.wallet.Client; add
nil checks while holding the read lock: verify fc.wallet != nil and
fc.wallet.Client != nil before extracting RPCClient, unlock, then if RPCClient
is nil return a clear error (e.g. "no RPC client available") instead of
dereferencing to avoid panics; ensure you keep the locking/unlocking pattern and
return the error from ABCIInfo only when rpcClient is non-nil.
- Add ErrMismatchedNodeCounts and validate RPCAddrs/GRPCAddrs have equal length - Add failover handling with retry to SpaceHandler in api/client.go - Fix error encoding to not shadow err variable
Implement automatic failover capability for multiple RPC/GRPC nodes:
The failover state is kept in memory only. When a connection error is detected, the system automatically attempts to connect to the next configured node in the list.
Configuration example:
chain:
rpc_addrs:
- http://node1:26657
- http://node2:26657 grpc_addrs: - node1:9090 - node2:9090
Summary by CodeRabbit
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.