Client-side gRPC load balancer that routes requests using EWMA latency scores, per-endpoint circuit breakers, jittered exponential backoff, and optional request hedging to minimize tail latency and maximize availability.
An intelligent, production-grade load balancer inspired by the go-zero microservices framework and modern distributed systems research. This project goes beyond simple round-robin or random selection to provide adaptive, performance-aware request distribution with advanced resilience patterns.
Instead of checking all nodes (expensive) or picking one at random (potentially bad), P2C:
- Randomly selects 2 nodes
- Compares their current load
- Routes the request to the less loaded node
Why it works: Research shows that choosing the better of 2 random options provides exponentially better load distribution than pure random selection, at minimal cost.
Complexity: O(1) vs O(n) for checking all nodes, while achieving ~90% of the optimal distribution.
EWMA calculates a weighted average that gives more importance to recent data:
new_lag = old_lag × β + responseTime × (1 - β)
Where:
β(beta) is a decay factor between 0 and 1- Higher β = more weight on history
- Lower β = more weight on recent measurements
Why EWMA? It smooths out noise while remaining sensitive to real performance changes. A single slow request won't drastically change the average, but a sustained slowdown will be quickly detected.
The algorithm dynamically adjusts β based on request frequency using Newton's Law of Cooling:
β = e^(-td/τ)
Where:
td= time since last request (seconds)τ(tau) = time constant (600 seconds = 10 minutes)
Examples:
- High traffic (td = 1s): β ≈ 0.998 → Trust history, smooth out noise
- Medium gap (td = 600s): β ≈ 0.368 → Balance old and new data
- Long gap (td = 3000s): β ≈ 0.007 → Old data is stale, trust new measurement
This makes the system self-tuning based on traffic patterns.
load = lag × (inflight + 1)
Where:
lag= EWMA of response time (milliseconds)inflight= number of currently active requests
Why this formula?
- Captures both latency (how slow is the node?) and concurrency (how busy is it?)
- The
+1ensures lag always matters, even for idle nodes - A slow but idle node gets a moderate score
- A fast but busy node gets a moderate score
- A slow AND busy node gets a high (bad) score
- A fast AND idle node gets the best (low) score
All shared state uses atomic operations:
atomic.LoadInt64/atomic.StoreInt64for countersatomic.LoadUint64/atomic.StoreUint64for lag (stored as float64 bits)sync.RWMutexfor protecting the node list
This allows safe concurrent access from multiple goroutines.
When a request starts:
- P2C selects the best of 2 random nodes
StartRequest()increments theinflightcounter- Updates the
lasttimestamp - Load increases immediately (more inflight requests)
When a request finishes:
FinishRequest()decrements theinflightcounter- Calculates the actual response time
- Computes dynamic β based on time since last request
- Updates lag using EWMA formula
- Load adjusts based on new lag and reduced inflight count
[req 0] picked node: node2:8080
load after start: 2
load after finish: 4.182450324601795
---
[req 1] picked node: node1:8080
load after start: 2
load after finish: 1.6581643718150858
---
[req 2] picked node: node3:8080
load after start: 2
load after finish: 1.75791935244032
---
[req 3] picked node: node1:8080
load after start: 3.3163287436301716
load after finish: 7.825876886123887
---
Request 0:
- Picked node2 (all nodes start with load = 1.0)
- After start: load = 2 (lag=1.0, inflight=1, so 1.0 × (1+1) = 2)
- After finish: load = 4.18 (request took ~4.18 seconds, EWMA updated)
Request 1:
- Picked node1 instead of node2 (P2C compared two nodes, node1's load of 1.0 < node2's 4.18)
- Shows the algorithm avoiding the slower node
Request 3:
- Picked node1 again (still had lower load than alternatives)
- After start: load = 3.32 (lag from previous request × 2 inflight)
- After finish: load = 7.83 (this request was also slow)
The algorithm continuously adapts, routing requests away from slow or busy nodes.
PulseRoute/
├── pkg/
│ ├── load.go # Node struct, EWMA calculation, load tracking
│ └── balancer.go # Balancer struct, P2C selection algorithm
├── main.go # Demo/test implementation
├── go.mod # Go module definition
└── README.md # This file
# Run the demo
go run main.go
# Build the binary
go build -o pulseroute
# Run the binary
./pulserouteThis project is evolving from a learning exercise into a production-grade gRPC load balancer. See ARCHITECTURE.md for detailed design and ROADMAP.md for the complete implementation plan.
Completed:
- ✅ Basic Node structure with atomic operations
- ✅ EWMA calculation with dynamic β (beta)
- ✅ P2C (Power of 2 Choices) selection algorithm
- ✅ Load calculation formula (lag × inflight)
- ✅ Request lifecycle tracking (StartRequest/FinishRequest)
- ✅ Basic simulation demonstrating adaptive behavior
The next step is implementing per-endpoint circuit breakers with state transitions (CLOSED → OPEN → HALF-OPEN). This will make the system resilient to failing endpoints.
See ROADMAP.md for the complete 12-week implementation plan covering:
- Phase 2: Enhanced P2C with health awareness & retry logic
- Phase 3: gRPC integration with interceptors & health checks
- Phase 4: Request hedging & service discovery (Consul/k8s)
- Phase 5: Observability (Prometheus, Jaeger, Grafana)
- Phase 6: Production readiness (testing, docs, benchmarks)
- Phase 7: Advanced features (sidecar agent, control plane UI)
- Self-balancing: No central coordinator needed
- Adaptive: Responds to changing node performance
- Efficient: O(1) selection cost
- Resilient: Automatically avoids slow or overloaded nodes
- Smooth: EWMA prevents overreaction to temporary spikes
- Self-tuning: Dynamic β adjusts to traffic patterns
- go-zero Adaptive Load Balancing - Original implementation
- The Power of Two Choices in Randomized Load Balancing - Research paper
- EWMA on Wikipedia
Built as a learning project to understand adaptive load balancing algorithms and their real-world applications in distributed systems.