A fault-tolerant application layer (Layer 7) load balancer built in Python. PyBalance distributes HTTP traffic across multiple backend servers with intelligent routing, health monitoring, and high-performance async I/O.
- Overview
- Features
- Architecture
- Quick Start
- Configuration
- Routing Algorithms
- Performance
- Testing
- Design Decisions
- Project Structure
- Contributing
- License
PyBalance is a learning project that demonstrates production-grade load balancing concepts. It implements a complete load balancer with:
- Multiple routing algorithms for different use cases
- Automatic health monitoring with fault detection and recovery
- High-performance async I/O using Python's asyncio
- Thread-safe operations for concurrent access
- Optional C++ extension for performance-critical operations
- Docker-based deployment for production-like testing
This project showcases:
- System Design: Understanding distributed systems, load balancing, and fault tolerance
- Concurrency: Async/await patterns, threading, and thread safety
- Performance: Optimizing critical paths with C++ extensions
- DevOps: Docker, containerization, and service orchestration
- Production Practices: Error handling, logging, metrics, and graceful shutdown
7 Routing Algorithms
- Round Robin: Even distribution
- Weighted Round Robin: Capacity-based distribution
- IP Hashing: Session affinity
- Least Connections: Load-aware routing
- Random: Simple random selection
- URL Hashing: Content-based routing
- Consistent Hashing: Minimal redistribution on server changes
Health Monitoring
- Automatic TCP health checks
- Background monitoring thread
- Automatic failover and recovery
- Configurable check intervals and timeouts
High Performance
- Async/await for thousands of concurrent connections
- Non-blocking I/O using asyncio
- Optional C++ extension for 2-3x speedup on large transfers
- Efficient connection handling
Fault Tolerance
- Automatic dead server detection
- Graceful error handling (503, 502, 504)
- No single point of failure
- Automatic recovery when servers come back online
Observability
- Metrics endpoint (
/metrics) with JSON output - Request/error tracking per backend
- Active connection monitoring
- Uptime and requests-per-second metrics
Production Ready
- Docker Compose setup
- Graceful shutdown handling
- Comprehensive logging
- Thread-safe operations
PyBalance follows a modular architecture with three core components:
┌─────────────┐
│ Client │
└──────┬──────┘
│ HTTP Request
▼
┌─────────────────────────────────────┐
│ PyBalance Load Balancer │
│ ┌───────────────────────────────┐ │
│ │ Proxy Engine (proxy.py) │ │
│ │ - Handles client connections │ │
│ │ - Async request forwarding │ │
│ │ - Response pipelining │ │
│ └───────────┬────────────────────┘ │
│ │ │
│ ┌───────────▼────────────────────┐ │
│ │ Router (router.py) │ │
│ │ - Server selection │ │
│ │ - Algorithm implementation │ │
│ │ - Thread-safe operations │ │
│ └───────────┬────────────────────┘ │
│ │ │
│ ┌───────────▼────────────────────┐ │
│ │ Health Monitor │ │
│ │ (health_monitor.py) │ │
│ │ - Background health checks │ │
│ │ - Server status updates │ │
│ └────────────────────────────────┘ │
└───────────┬──────────────────────────┘
│
▼
┌───────────────┐
│ Backend │
│ Servers │
│ (Nginx) │
└───────────────┘
-
Proxy Engine (
proxy.py)- Accepts client connections asynchronously
- Parses HTTP requests
- Forwards requests to selected backend
- Streams responses back to clients
- Handles timeouts and errors
-
Router (
router.py)- Maintains list of backend servers
- Implements routing algorithms
- Thread-safe server selection
- Tracks server state (alive/dead)
-
Health Monitor (
health_monitor.py)- Runs in background thread
- Periodically checks server health via TCP connect
- Updates server status in router
- Detects failures and recoveries
-
Metrics (
metrics.py)- Tracks request counts per backend
- Monitors error rates
- Calculates requests per second
- Provides JSON metrics endpoint
For detailed architecture and design decisions, see ARCHITECTURE.md.
- Python 3.7 or higher
- Docker Desktop (for backend servers)
- Make (optional, for building C++ extension)
- Clone the repository:
git clone https://github.com/yourusername/LoadBalancer.git
cd LoadBalancer- Verify Python version:
python3 --version # Should be 3.7+- Optional: Build C++ Extension (for performance boost):
# Install build dependencies
pip3 install -r requirements.txt
# Build the extension
make install
# OR
python3 setup.py build_ext --inplaceThe load balancer works perfectly without the C++ extension - it will automatically detect and use it if available, or fall back to pure Python.
Step 1: Start Backend Servers
Using Docker (recommended):
./start.shThis starts 3 Nginx containers on ports 5003, 5001, and 5002.
Step 2: Start Load Balancer
python3 -m src.mainYou should see:
INFO - PyBalance listening on 0.0.0.0:8080
INFO - Routing algorithm: random
INFO - Backend servers: 3
INFO - Metrics endpoint: http://0.0.0.0:8080/metrics
Step 3: Test It
# Make a request
curl http://localhost:8080
# Test round-robin distribution
./test_algorithms.sh
# View metrics
curl http://localhost:8080/metrics | python3 -m json.toolStep 4: Stop Everything
./stop.sh
# OR
docker-compose down./test_algorithms.sh- Test routing algorithm distribution./test_now.sh- Quick verification test./demo.sh- Full demonstration with health monitoring./clean_start.sh- Kill all processes for fresh start
Edit config.py to customize the load balancer:
# Backend servers
BACKEND_SERVERS = [
{"host": "localhost", "port": 5003, "weight": 5}, # 5x capacity
{"host": "localhost", "port": 5001, "weight": 1},
{"host": "localhost", "port": 5002, "weight": 1},
]
# Routing algorithm
ROUTING_ALGORITHM = RoutingAlgorithm.WEIGHTED_ROUND_ROBIN
# Health check settings
HEALTH_CHECK_INTERVAL = 5 # Check every 5 seconds
HEALTH_CHECK_TIMEOUT = 2 # 2 second timeoutRoutingAlgorithm.ROUND_ROBIN- Even distributionRoutingAlgorithm.WEIGHTED_ROUND_ROBIN- Based on server weightsRoutingAlgorithm.IP_HASH- Same client → same serverRoutingAlgorithm.LEAST_CONNECTIONS- Route to least loadedRoutingAlgorithm.RANDOM- Random selectionRoutingAlgorithm.URL_HASH- Same URL → same serverRoutingAlgorithm.CONSISTENT_HASH- Better hash distribution
See docs/ROUTING_ALGORITHMS.md for detailed explanations.
Distributes requests evenly across all healthy servers:
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)
Use Case: Simple, even distribution when all servers have equal capacity.
Distributes based on server weights:
Server A: weight=5 (powerful machine)
Server B: weight=1 (weaker machine)
# Result: 5 requests to A for every 1 request to BUse Case: Servers with different capacities (CPU, memory, etc.).
Same client IP always hits the same server:
Client 192.168.1.50 → Always Server B
Client 192.168.1.51 → Always Server C
Use Case: Session affinity, caching optimization.
Routes to server with fewest active connections:
Server A: 10 active connections
Server B: 5 active connections
Server C: 8 active connections
→ Request goes to Server B
Use Case: Long-lived connections, WebSockets, real-time applications.
Randomly selects a server from the pool.
Use Case: Simple scenarios, testing, when distribution doesn't matter.
Same URL path always hits the same server:
GET /api/users → Always Server A
GET /api/products → Always Server B
Use Case: Content caching, CDN-like behavior.
Better hash distribution than simple hashing, minimal redistribution when servers are added/removed.
Use Case: Distributed systems, dynamic server pools, minimizing cache misses.
- Concurrent Connections: 10,000+ on modern hardware
- Request Latency: < 1ms overhead per request
- Throughput: 5,000+ requests/second (depends on backend)
- Memory: Efficient async I/O, no thread overhead per connection
- Async I/O: Uses
asynciofor non-blocking operations - C++ Extension: Optional 2-3x speedup for large transfers
- Zero-Copy Operations: Where possible in C++ extension
- Efficient Parsing: Fast HTTP header parsing
- Connection Pooling: Reuses connections efficiently
The optional C++ extension (proxy_cpp.cpp) provides:
- Optimized
memcpyfor large buffer transfers - Fast HTTP header parsing
- Memory-efficient buffer concatenation
- Zero-copy buffer views
Note: The extension is optional. PyBalance automatically detects and uses it if available, or gracefully falls back to pure Python.
# Test round-robin distribution
./test_algorithms.sh
# Quick verification
./test_now.sh
# Full demonstration
./demo.sh# Using Apache Bench
ab -n 1000 -c 10 http://localhost:8080/
# Using included load test script
python3 tests/load_test.py --requests 1000 --concurrency 10- Start load balancer:
python3 -m src.main - Stop one backend:
docker-compose stop backend1 - Watch logs - server marked as dead within 5 seconds
- Make requests - only backend2 and backend3 receive traffic
- Restart backend:
docker-compose start backend1 - Watch logs - server marked as alive again
- Requests resume to all backends
This demonstrates automatic fault tolerance!
See docs/LOAD_TESTING.md for comprehensive testing scenarios including:
- Concurrent requests
- Sustained load
- Burst traffic
- Fault tolerance
- Recovery scenarios
Decision: Use asyncio for the proxy engine instead of threading.
Rationale:
- Scalability: Can handle 10,000+ concurrent connections with minimal overhead
- Efficiency: No thread context switching overhead
- Simplicity: Single-threaded event loop is easier to reason about
- Performance: Non-blocking I/O is faster for I/O-bound operations
Trade-off: Some operations (like health monitoring) still use threading because they need to run independently in the background.
Decision: Run health monitoring in a separate thread.
Rationale:
- Independence: Health checks need to run continuously, regardless of request load
- Blocking Operations: TCP connect is a blocking operation that would block the event loop
- Simplicity: Threading is straightforward for periodic background tasks
Trade-off: Requires thread-safe operations (locks) when updating shared state.
Decision: Make C++ extension optional, not required.
Rationale:
- Accessibility: Project should work out-of-the-box with just Python
- Performance: C++ provides 2-3x speedup for large transfers
- Flexibility: Users can choose based on their needs
- Learning: Demonstrates both Python and C++ integration
Trade-off: Slightly more complex build process, but graceful fallback ensures it always works.
Decision: Use threading.Lock for all router operations.
Rationale:
- Safety: Prevents race conditions when health monitor and proxy engine access shared state
- Correctness: Ensures server selection is atomic
- Simplicity: Python's threading.Lock is straightforward and well-understood
Trade-off: Small performance overhead, but necessary for correctness.
Decision: Use simple TCP connect for health checks instead of HTTP.
Rationale:
- Simplicity: TCP connect is fast and reliable
- Low Overhead: No HTTP parsing required
- Effectiveness: If TCP connect succeeds, server is likely healthy
- Speed: Faster than full HTTP health check
Trade-off: Doesn't verify application-level health, but sufficient for basic load balancing.
For more design decisions, see docs/ARCHITECTURE.md.
LoadBalancer/
├── src/ # Core application source code
│ ├── __init__.py # Package initialization
│ ├── main.py # Entry point, orchestrates components
│ ├── router.py # Request routing logic
│ ├── proxy.py # Async proxy engine
│ ├── health_monitor.py # Background health monitoring
│ ├── metrics.py # Metrics collection
│ ├── config.py # Configuration settings
│ └── proxy_cpp.cpp # Optional C++ extension
│
├── tests/ # Test files
│ ├── load_test.py # Load testing script
│ └── test_load_balancer.py # Unit tests
│
├── scripts/ # Utility scripts
│ ├── start.sh # Start Docker backends
│ ├── stop.sh # Stop Docker backends
│ ├── test_algorithms.sh # Test routing algorithms
│ ├── test_now.sh # Quick test
│ ├── demo.sh # Full demonstration
│ └── clean_start.sh # Clean restart
│
├── docs/ # Documentation
│ ├── ARCHITECTURE.md # Design decisions and rationale
│ ├── CODE_STRUCTURE.md # Code navigation guide
│ ├── CONTRIBUTING.md # Contribution guidelines
│ ├── ROUTING_ALGORITHMS.md # Algorithm explanations
│ ├── LOAD_TESTING.md # Testing guide
│ └── BUILD_CPP.md # C++ extension build guide
│
├── test_backends/ # Backend test files
│ ├── backend1/ # HTML files for backend 1
│ ├── backend2/ # HTML files for backend 2
│ └── backend3/ # HTML files for backend 3
│
├── README.md # Main documentation
├── LICENSE # MIT License
├── requirements.txt # Python dependencies
├── docker-compose.yml # Docker backend setup
├── setup.py # Build script for C++ extension
└── Makefile # Build commands
- Core Modules (
src/): All application code organized as a Python package - Tests (
tests/): Test files for validation - Scripts (
scripts/): Shell scripts for common operations - Documentation (
docs/): Comprehensive documentation - Configuration: Root-level files for setup and deployment
Contributions are welcome! This is a learning project, so feel free to:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
- Additional routing algorithms
- SSL/TLS termination
- Prometheus metrics integration
- Kubernetes deployment manifests
- Performance optimizations
- Documentation improvements
MIT License - see LICENSE file for details.
Built as a learning project to understand:
- Load balancing and distributed systems
- Async programming in Python
- System design and architecture
- Performance optimization
- Production deployment practices
Note: This is a learning project. For production use, consider established solutions like Nginx, HAProxy, or cloud load balancers. However, this project demonstrates the core concepts and can be extended for specific use cases.