Skip to content

ThalaLabs/aptos-resilient-client

Repository files navigation

Aptos Resilient Client

A resilient Aptos client with automatic failover and recovery capabilities. This client maintains multiple RPC endpoint connections and automatically switches between them when failures occur, ensuring high availability for your Aptos blockchain interactions.

Features

  • Automatic Failover: Switches to backup endpoints when the primary fails
  • Auto Recovery: Automatically switches back to higher-priority endpoints when they recover
  • Configurable Thresholds: Customize failure tolerance and health check intervals
  • Request Timeout: Configurable timeout for all requests
  • Health Monitoring: Periodic health checks for failed endpoints
  • Full Aptos SDK Compatibility: Works as a drop-in replacement for the standard Aptos client
  • TypeScript Support: Full type definitions included

Installation

npm install @thalalabs/aptos-resilient-client @aptos-labs/ts-sdk

Quick Start

import { AptosResilientClient } from "@thalalabs/aptos-resilient-client";
import { AptosConfig } from "@aptos-labs/ts-sdk";

// Create a resilient client with multiple endpoints
const resilientClient = new AptosResilientClient({
  endpoints: [
    new AptosConfig({ fullnode: "https://api.mainnet.aptoslabs.com/v1" }), // Primary
    new AptosConfig({ fullnode: "https://1rpc.io/aptos/v1" }),              // Backup
  ],
  unhealthyThreshold: 3,      // Mark endpoint unhealthy after 3 failures
  healthCheckInterval: 30000, // Check every 30 seconds
  requestTimeout: 10000,      // 10 second timeout per request
});

// Get the Aptos client instance
const client = resilientClient.getClient();

// Use it just like the standard Aptos client
const ledgerInfo = await client.getLedgerInfo();
const accountInfo = await client.getAccountInfo({ accountAddress: "0x1" });

// Get statistics
const stats = resilientClient.getStats();
console.log("Active endpoint:", stats.activeEndpointUrl);
console.log("Total failovers:", stats.totalFailovers);

// Clean up when done
resilientClient.destroy();

Configuration

ResilientClientConfig

Property Type Default Description
endpoints AptosConfig[] required Array of AptosConfig objects in priority order (first = highest priority). Each AptosConfig can specify fullnode URL, network, and custom client configuration (headers, API keys, etc.)
unhealthyThreshold number 3 Number of consecutive failures before marking endpoint as unhealthy. Unhealthy endpoints are skipped in future requests.
healthCheckInterval number 30000 Interval in milliseconds between health checks for unhealthy endpoints
requestTimeout number 10000 Timeout in milliseconds for each request

How It Works

Failover Logic

Within a Single Request:

  1. The client tries all healthy endpoints sequentially, starting with the highest priority
  2. If endpoint A fails, it immediately tries endpoint B (no retries)
  3. Each failure increments that endpoint's consecutive failure counter
  4. The request succeeds as soon as any endpoint responds successfully

Across Multiple Requests:

  1. After unhealthyThreshold consecutive failures, an endpoint is marked unhealthy
  2. Unhealthy endpoints are skipped entirely in future requests (cost optimization)
  3. The active endpoint becomes the first healthy endpoint in priority order

Example with 3 endpoints (threshold = 3):

  • Request 1: Primary fails (1/3) → Backup succeeds ✓
  • Request 2: Primary fails (2/3) → Backup succeeds ✓
  • Request 3: Primary fails (3/3, marked unhealthy) → Backup succeeds ✓
  • Request 4+: Skip primary entirely → Backup succeeds ✓ (saves time & money)

Recovery Logic

  1. A background health check runs every healthCheckInterval milliseconds
  2. Unhealthy endpoints are tested with a lightweight request (getLedgerInfo())
  3. If an endpoint becomes healthy again and has higher priority than the current active endpoint, the client switches back to it
  4. This ensures you always use the highest-priority available endpoint

Error Handling

The client distinguishes between:

  • Network errors (timeout, connection refused, etc.) → Triggers failover
  • Application errors (invalid parameters, etc.) → Thrown immediately without failover

Network errors that trigger failover include:

  • Timeouts
  • Connection refused (ECONNREFUSED)
  • DNS errors (ENOTFOUND)
  • Connection reset (ECONNRESET)
  • HTTP 502, 503, 504 errors

API Reference

AptosResilientClient

Constructor

new AptosResilientClient(config: ResilientClientConfig)

Methods

getClient(): Aptos

Returns the proxy Aptos client instance that should be used for all operations.

const client = resilientClient.getClient();
const ledgerInfo = await client.getLedgerInfo();
getStats(): ClientStats

Returns current statistics about the client.

const stats = resilientClient.getStats();
console.log(stats.activeEndpointUrl); // Currently active endpoint
console.log(stats.totalFailovers);    // Total number of failovers
console.log(stats.totalRecoveries);   // Total number of recoveries
console.log(stats.endpoints);         // Health status of all endpoints
checkHealth(): Promise<void>

Manually trigger a health check for all unhealthy endpoints.

await resilientClient.checkHealth();
destroy(): void

Stop the health check interval and clean up resources.

resilientClient.destroy();

ClientStats

interface ClientStats {
  activeEndpointIndex: number;      // Index of currently active endpoint
  activeEndpointUrl: string;        // URL of currently active endpoint
  endpoints: EndpointState[];       // State of all endpoints
  totalFailovers: number;           // Total failovers that have occurred
  totalRecoveries: number;          // Total recoveries (switches back to higher priority)
}

EndpointState

interface EndpointState {
  url: string;                      // The RPC endpoint URL
  healthy: boolean;                 // Whether this endpoint is currently healthy
  consecutiveFailures: number;      // Number of consecutive failures
  lastFailureTime?: number;         // Timestamp of last failure
  lastSuccessTime?: number;         // Timestamp of last successful request
}

Examples

Basic Usage

import { AptosResilientClient } from "@thalalabs/aptos-resilient-client";
import { AptosConfig } from "@aptos-labs/ts-sdk";

const resilientClient = new AptosResilientClient({
  endpoints: [
    new AptosConfig({ fullnode: "https://api.mainnet.aptoslabs.com/v1" }),
    new AptosConfig({ fullnode: "https://1rpc.io/aptos/v1" }),
  ],
});

const client = resilientClient.getClient();

// Fetch ledger info
const ledgerInfo = await client.getLedgerInfo();
console.log("Chain ID:", ledgerInfo.chain_id);

// Get account info
const accountInfo = await client.getAccountInfo({
  accountAddress: "0x1"
});

resilientClient.destroy();

Monitoring Health

import { AptosConfig } from "@aptos-labs/ts-sdk";

const resilientClient = new AptosResilientClient({
  endpoints: [
    new AptosConfig({ fullnode: "https://endpoint1.com" }),
    new AptosConfig({ fullnode: "https://endpoint2.com" }),
  ],
  healthCheckInterval: 10000, // Check every 10 seconds
});

const client = resilientClient.getClient();

// Monitor stats periodically
setInterval(() => {
  const stats = resilientClient.getStats();
  console.log("Active:", stats.activeEndpointUrl);
  console.log("Failovers:", stats.totalFailovers);

  stats.endpoints.forEach(ep => {
    console.log(`${ep.url}: ${ep.healthy ? 'healthy' : 'unhealthy'} (${ep.consecutiveFailures} failures)`);
  });
}, 5000);

Cost Optimization Use Case

import { AptosConfig } from "@aptos-labs/ts-sdk";

// Optimize for cost: use cheap primary, only use expensive backups when necessary
const resilientClient = new AptosResilientClient({
  endpoints: [
    new AptosConfig({ fullnode: "https://cheap-primary.example.com" }),    // Cheapest, use whenever possible
    new AptosConfig({ fullnode: "https://expensive-backup.example.com" }), // More expensive, use when primary fails
  ],
  unhealthyThreshold: 3,      // Allow 3 failures before giving up on primary
  healthCheckInterval: 30000, // Check primary every 30s to switch back ASAP
  requestTimeout: 10000,      // Don't wait too long on failed endpoints
});

Custom Configuration with API Keys

import { AptosConfig, Network } from "@aptos-labs/ts-sdk";

const resilientClient = new AptosResilientClient({
  endpoints: [
    // Primary: Provider with API key
    new AptosConfig({
      fullnode: "https://primary.example.com",
      network: Network.MAINNET,
      clientConfig: {
        API_KEY: "your-api-key",
        headers: {
          "X-Custom-Header": "value"
        }
      }
    }),
    // Backup: Another provider with different API key
    new AptosConfig({
      fullnode: "https://backup.example.com",
      network: Network.MAINNET,
      clientConfig: {
        API_KEY: "your-backup-api-key",
      }
    }),
    // Fallback: Free public endpoint (no API key needed)
    new AptosConfig({ fullnode: "https://api.mainnet.aptoslabs.com/v1" }),
  ],
  unhealthyThreshold: 2,       // More aggressive - mark unhealthy after 2 failures
  healthCheckInterval: 60000,  // Check every minute
  requestTimeout: 15000,       // 15 second timeout
});

Development

Build

pnpm build

Run Example

pnpm example

Best Practices

  1. Endpoint Priority: List endpoints in order of preference (fastest/most reliable first)
  2. Timeout Configuration: Set requestTimeout based on your network conditions and requirements
  3. Health Check Interval: Balance between quick recovery and avoiding unnecessary requests
  4. Cleanup: Always call destroy() when you're done to clean up the health check interval
  5. Error Handling: Wrap operations in try-catch blocks as you would with the standard Aptos client

Showcase

import { AptosResilientClient } from "@thalalabs/aptos-resilient-client";

// Create a resilient client with multiple endpoints
const resilientClient = new AptosResilientClient({
  endpoints: [
    new AptosConfig({ fullnode: "https://your-primary-aptos-rpc/v1" }), // Primary
    new AptosConfig({ fullnode: "https://api.mainnet.aptoslabs.com/v1" }), // Backup
  ],
  unhealthyThreshold: 3,      // Mark endpoint unhealthy after 3 failures
  healthCheckInterval: 30000, // Check every 30 seconds
  requestTimeout: 10000,      // 10 second timeout per request
});

// Get the Aptos client instance
const client = resilientClient.getClient();

License

MIT

About

A resilient Aptos client with automatic failover and recovery capabilities.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors