-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Problem
Currently in submux/mod.rs (PR #621), the get_subscriptions function only unions subscriptions that each inner client currently reports. After ChainPubsubActor::abort_and_signal_connection_issue runs, every client clears its subscription map.
Critical Issue: If all endpoints disconnect (e.g., during a provider outage), the union becomes empty, so resub_multiple runs with zero pubkeys and the system never resubscribes. From that point forward, the remote account provider still believes accounts are being watched, but no updates will ever arrive.
Root Cause
fn get_subscriptions(clients: &[Arc<T>]) -> Vec<Pubkey> {
let mut all_subs = HashSet::new();
for client in clients {
all_subs.extend(client.subscriptions()); // Returns empty after abort!
}
all_subs.into_iter().collect()
}When all clients abort due to connection issues, their internal subscription maps are cleared. The reconnection logic then queries these empty maps to determine which accounts to resubscribe, resulting in zero resubscriptions.
Proposed Solution
SubMux needs to persist the intended subscription set independently, rather than relying on clients that have already dropped their state.
Implementation Approach
-
Add authoritative subscription tracking to SubMux:
- Maintain an internal
HashSet<Pubkey>that tracks intended subscriptions - This set persists across client disconnects
- Maintain an internal
-
Update subscription lifecycle:
- In
subscribe: Add pubkey to internal set - In
unsubscribe: Remove pubkey from internal set - Make the set thread-safe (guard with mutex/lock)
- In
-
Fix reconnection logic:
- Change
get_subscriptionsto read from the authoritative internal set - Or replace it with a method on
&selfthat returnsVec<Pubkey>from the persisted set - Ensure
resub_multipleuses this authoritative list during reconnects
- Change
Impact
Without this fix, a full-cluster disconnect permanently breaks monitoring until manual intervention. This is especially critical for production environments where RPC provider outages can occur.
Related
- PR feat: enhance subscription management with metrics and reconnection #621: Enhanced subscription management with metrics and reconnection
- Comment thread: feat: enhance subscription management with metrics and reconnection #621 (comment)