This implementation provides a robust multi-cloud database failover system for the Vesting Vault backend, ensuring 99.99% uptime for critical financial data. The system automatically switches between primary and secondary database instances when the primary becomes unavailable.
- Provider: AWS RDS PostgreSQL
- Role: Primary read-write database
- Connection: Direct application queries
- Provider: Google Cloud SQL or DigitalOcean Managed Database
- Role: Warm standby, read-only during failover
- Connection: Automatic failover when primary is unavailable
- Frequency: Every 10 seconds
- Timeout: 30 seconds for failover trigger
- Method: Simple
SELECT 1query with response time tracking
- Detects primary database unavailability within 30 seconds
- Automatically switches to secondary database
- Enters read-only mode to ensure data consistency
- Continuous health checks every 10 seconds
- Response time tracking for performance monitoring
- Automatic recovery when primary database comes back online
- Prevents write operations during failover scenarios
- Protects data consistency across database instances
- Automatic read-write mode restoration on recovery
- Supports PostgreSQL and MySQL databases
- Configurable for different cloud providers
- SSL/TLS connection support
# Primary Database (AWS PostgreSQL)
DB_PRIMARY_HOST=your-aws-rds-host.rds.amazonaws.com
DB_PRIMARY_PORT=5432
DB_PRIMARY_NAME=vesting_vault
DB_PRIMARY_USER=postgres
DB_PRIMARY_PASSWORD=your_secure_password
DB_PRIMARY_SSL=true
# Secondary Database (Google Cloud MySQL)
DB_SECONDARY_HOST=your-gcp-host.cloudsql.com
DB_SECONDARY_PORT=3306
DB_SECONDARY_NAME=vesting_vault_backup
DB_SECONDARY_USER=root
DB_SECONDARY_PASSWORD=your_secure_password
DB_SECONDARY_SSL=true
DB_SECONDARY_TYPE=mysql
# Failover Configuration
FAILOVER_TIMEOUT_MS=30000
HEARTBEAT_INTERVAL_MS=10000npm installcp .env.example .env
# Edit .env with your database credentials# Run on primary database
psql -h $DB_PRIMARY_HOST -U $DB_PRIMARY_USER -d $DB_PRIMARY_NAME -f schema.sql
# Run on secondary database (if MySQL)
mysql -h $DB_SECONDARY_HOST -u $DB_SECONDARY_USER -p $DB_SECONDARY_NAME < schema.sqlnpm startGET /api/healthReturns system health status including database failover information:
{
"status": "healthy",
"timestamp": "2024-01-01T00:00:00.000Z",
"database": {
"currentDB": "primary",
"isReadOnly": false,
"lastHeartbeat": 1704067200000,
"uptime": 15000
},
"uptime": 300
}GET /api/user/:address/portfolioGET /api/vaults?page=1&limit=20node test-failover.js- Initialization Test: Verifies failover manager setup
- Heartbeat Test: Confirms monitoring system works
- Read-Only Mode Test: Validates write protection during failover
- Failover Timeout Test: Simulates primary database failure
- API Endpoint Tests: Tests all endpoints with failover
The system provides detailed logging for:
- ✅ Successful heartbeat checks with response times
- ❌ Failed heartbeat attempts
- 🔄 Failover events and recovery
⚠️ Warnings for degraded performance
- Monitor
/api/healthendpoint - Track database uptime metrics
- Alert on failover events
- Monitor response times
- Automatic Detection: System detects failure within 30 seconds
- Automatic Failover: Switches to secondary database
- Read-Only Mode: Enters read-only to protect data consistency
- User Impact: Investors can still view claims (read operations)
- Automatic Detection: Heartbeat detects recovery
- Automatic Switchback: Returns to primary database
- Read-Write Mode: Restores full functionality
- Data Sync: Ensure data replication is current
If automatic failover fails:
- Check environment configuration
- Verify secondary database connectivity
- Review logs for error details
- Manual restart if necessary
- Primary: 20 max connections
- Secondary: 20 max connections
- 30-second idle timeout
- 2-second connection timeout
- Heartbeat: < 100ms
- Query operations: < 500ms
- Failover detection: 30 seconds
- Recovery: < 5 seconds
- SSL/TLS encryption for all connections
- Environment variable credential management
- Network security group configurations
- Regular password rotation
- Read-only mode during failover
- Input validation and sanitization
- Error handling without information leakage
- Audit logging for all operations
- Check environment variables
- Verify secondary database connectivity
- Review heartbeat interval settings
- Validate SSL certificates
- Check network security groups
- Verify database credentials
- Monitor connection pool usage
- Check database query performance
- Review network latency
Enable debug logging:
DEBUG=failover:* npm start- Target: 99.99% uptime
- Failover Time: < 30 seconds
- Recovery Time: < 5 seconds
- Data Availability: Read operations always available
- Read-Only Failover: Protects data integrity
- Warm Standby: Immediate availability
- Heartbeat Monitoring: Proactive failure detection
- Automatic Recovery: Minimizes downtime
- Multi-region database replication
- Load balancing for read operations
- Advanced monitoring dashboard
- Automated backup verification
- Database migration tools
- Horizontal scaling support
- Database sharding capability
- Caching layer integration
- Performance optimization
For issues related to the multi-cloud database failover system:
- Check this documentation
- Review system logs
- Run diagnostic tests
- Contact the development team
Implementation Status: ✅ Complete
Issue Resolution: #116, #60
Last Updated: 2024-01-01