Skip to content

Resubscription logic loses accounts when all clients disconnect #622

@thlorenz

Description

@thlorenz

Problem

Currently in submux/mod.rs (PR #621), the get_subscriptions function only unions subscriptions that each inner client currently reports. After ChainPubsubActor::abort_and_signal_connection_issue runs, every client clears its subscription map.

Critical Issue: If all endpoints disconnect (e.g., during a provider outage), the union becomes empty, so resub_multiple runs with zero pubkeys and the system never resubscribes. From that point forward, the remote account provider still believes accounts are being watched, but no updates will ever arrive.

Root Cause

fn get_subscriptions(clients: &[Arc<T>]) -> Vec<Pubkey> {
    let mut all_subs = HashSet::new();
    for client in clients {
        all_subs.extend(client.subscriptions());  // Returns empty after abort!
    }
    all_subs.into_iter().collect()
}

When all clients abort due to connection issues, their internal subscription maps are cleared. The reconnection logic then queries these empty maps to determine which accounts to resubscribe, resulting in zero resubscriptions.

Proposed Solution

SubMux needs to persist the intended subscription set independently, rather than relying on clients that have already dropped their state.

Implementation Approach

  1. Add authoritative subscription tracking to SubMux:

    • Maintain an internal HashSet<Pubkey> that tracks intended subscriptions
    • This set persists across client disconnects
  2. Update subscription lifecycle:

    • In subscribe: Add pubkey to internal set
    • In unsubscribe: Remove pubkey from internal set
    • Make the set thread-safe (guard with mutex/lock)
  3. Fix reconnection logic:

    • Change get_subscriptions to read from the authoritative internal set
    • Or replace it with a method on &self that returns Vec<Pubkey> from the persisted set
    • Ensure resub_multiple uses this authoritative list during reconnects

Impact

Without this fix, a full-cluster disconnect permanently breaks monitoring until manual intervention. This is especially critical for production environments where RPC provider outages can occur.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions