This repository was archived by the owner on Mar 14, 2026. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
This repository was archived by the owner on Mar 14, 2026. It is now read-only.
[Feature Request] Implement Multi-Provider LLM Microservices Architecture with Docker Orchestration #36
Copy link
Copy link
Open
Description
Multi-Provider LLM Microservices Architecture
Issue Labels
enhancementarchitecturedockermicroserviceslitellm
📋 Summary
Currently, the AIMO-Models project uses a single LiteLLM proxy service that aggregates multiple LLM providers through OpenRouter. We propose implementing a microservices architecture where each LLM provider runs in its own isolated Docker container, managed through a unified API gateway.
🎯 Motivation
Current Limitations
- Single Point of Failure: All providers depend on one LiteLLM instance
- Resource Contention: All models share the same container resources
- Difficult Scaling: Cannot independently scale specific providers
- Maintenance Complexity: Updates affect all providers simultaneously
- Limited Isolation: Provider failures can impact other services
Proposed Benefits
- Fault Isolation: Each provider runs independently
- Independent Scaling: Scale providers based on demand
- Easier Maintenance: Update/restart individual services
- Better Resource Management: Allocate resources per provider
- Enhanced Monitoring: Per-provider metrics and logging
- Dynamic Provider Management: Add/remove providers without downtime
🏗️ Proposed Architecture
┌─────────────────────────────────────────────────────────────┐
│ API Gateway / Load Balancer │
│ (Main AIMO Service) │
└─────────────────┬───────────────┬───────────────┬───────────┘
│ │ │
┌────────▼──────┐ ┌─────▼─────┐ ┌──────▼──────┐
│ OpenRouter │ │ Phala │ │ ChutesAI │
│ Service │ │ Service │ │ Service │
│ (Port 4001) │ │(Port 4002)│ │ (Port 4003) │
└───────────────┘ └───────────┘ └─────────────┘
│ │ │
┌────────▼──────┐ ┌─────▼─────┐ ┌──────▼──────┐
│ Nebula Block │ │ Provider │ │ Provider │
│ Service │ │ Service N │ │ Service N+1 │
│ (Port 4004) │ │(Port 400N)│ │(Port 400N+1)│
└───────────────┘ └───────────┘ └─────────────┘
📁 Proposed File Structure
infra/
├── litellm/
│ ├── common/
│ │ ├── docker-compose.base.yml # Base configuration
│ │ └── shared-network.yml # Network definitions
│ ├── providers/
│ │ ├── openrouter/
│ │ │ ├── docker-compose.yml
│ │ │ ├── config.yaml
│ │ │ └── .env.openrouter
│ │ ├── phala/
│ │ │ ├── docker-compose.yml
│ │ │ ├── config.yaml
│ │ │ └── .env.phala
│ │ ├── chutesai/
│ │ │ ├── docker-compose.yml
│ │ │ ├── config.yaml
│ │ │ └── .env.chutesai
│ │ └── nebula/
│ │ ├── docker-compose.yml
│ │ ├── config.yaml
│ │ └── .env.nebula
│ ├── gateway/
│ │ ├── docker-compose.yml
│ │ ├── nginx.conf # Load balancer config
│ │ └── .env.gateway
│ ├── orchestration/
│ │ ├── docker-compose.all.yml # Full orchestration
│ │ └── .env.all # Global environment
│ ├── monitoring/
│ │ ├── docker-compose.yml # Prometheus & Grafana
│ │ └── prometheus.yml
│ └── scripts/
│ ├── manage-services.sh # Service management
│ ├── health-check.sh # Health monitoring
│ └── deploy.sh # Deployment automation
🔧 Implementation Details
1. Individual Provider Services
Each provider will have its own Docker service configuration:
Example: OpenRouter Service (providers/openrouter/docker-compose.yml)
version: '3.8'
services:
openrouter-llm:
image: ghcr.io/berriai/litellm:main-latest
container_name: aimo-openrouter-llm
ports:
- "4001:4000"
volumes:
- ./config.yaml:/app/config.yaml
env_file:
- .env.openrouter
command: ["--config", "/app/config.yaml", "--port", "4000"]
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:4000/health"]
interval: 30s
timeout: 10s
retries: 3
networks:
- aimo-llm-network
deploy:
resources:
limits:
memory: 1G
cpus: "0.5"
networks:
aimo-llm-network:
external: true2. API Gateway Configuration
Nginx Load Balancer (gateway/nginx.conf)
upstream openrouter_backend {
server aimo-openrouter-llm:4000 weight=3 max_fails=2 fail_timeout=30s;
}
upstream phala_backend {
server aimo-phala-llm:4000 weight=2 max_fails=2 fail_timeout=30s;
}
upstream chutesai_backend {
server aimo-chutesai-llm:4000 weight=2 max_fails=2 fail_timeout=30s;
}
upstream nebula_backend {
server aimo-nebula-llm:4000 weight=1 max_fails=2 fail_timeout=30s;
}
# Health check endpoint
server {
listen 4000;
# Provider-specific routing
location /providers/openrouter/ {
proxy_pass http://openrouter_backend/;
include proxy_params;
}
location /providers/phala/ {
proxy_pass http://phala_backend/;
include proxy_params;
}
location /providers/chutesai/ {
proxy_pass http://chutesai_backend/;
include proxy_params;
}
location /providers/nebula/ {
proxy_pass http://nebula_backend/;
include proxy_params;
}
# Intelligent routing based on model name
location /v1/chat/completions {
# Route based on model parameter
set $backend openrouter_backend;
if ($request_body ~ "phala-") {
set $backend phala_backend;
}
if ($request_body ~ "chutes-") {
set $backend chutesai_backend;
}
if ($request_body ~ "nebula-") {
set $backend nebula_backend;
}
proxy_pass http://$backend;
include proxy_params;
}
# Health check aggregation
location /health {
access_log off;
return 200 '{"status":"healthy","timestamp":"$time_iso8601"}';
add_header Content-Type application/json;
}
}3. Service Management Scripts
Service Management (scripts/manage-services.sh)
#!/bin/bash
ACTION=$1
SERVICE=$2
PROVIDERS=("openrouter" "phala" "chutesai" "nebula")
BASE_DIR="$(dirname "$0")/.."
log() {
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $1"
}
create_network() {
if ! docker network ls | grep -q aimo-llm-network; then
log "Creating shared network..."
docker network create aimo-llm-network
fi
}
start_service() {
local service=$1
if [ "$service" == "all" ]; then
create_network
log "Starting all services..."
cd "$BASE_DIR/orchestration"
docker-compose -f docker-compose.all.yml up -d
elif [ "$service" == "gateway" ]; then
log "Starting gateway service..."
cd "$BASE_DIR/gateway"
docker-compose up -d
elif [[ " ${PROVIDERS[*]} " =~ " $service " ]]; then
create_network
log "Starting $service service..."
cd "$BASE_DIR/providers/$service"
docker-compose up -d
else
log "Unknown service: $service"
exit 1
fi
}
stop_service() {
local service=$1
if [ "$service" == "all" ]; then
log "Stopping all services..."
cd "$BASE_DIR/orchestration"
docker-compose -f docker-compose.all.yml down
elif [ "$service" == "gateway" ]; then
log "Stopping gateway service..."
cd "$BASE_DIR/gateway"
docker-compose down
elif [[ " ${PROVIDERS[*]} " =~ " $service " ]]; then
log "Stopping $service service..."
cd "$BASE_DIR/providers/$service"
docker-compose down
else
log "Unknown service: $service"
exit 1
fi
}
show_status() {
log "Service Status:"
docker ps --filter "name=aimo-*-llm" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
log "\nHealth Status:"
for provider in "${PROVIDERS[@]}"; do
port=$((4000 + $(printf '%d\n' $(for i in "${!PROVIDERS[@]}"; do [ "${PROVIDERS[$i]}" = "$provider" ] && echo "$((i+1))"; done))))
if curl -sf "http://localhost:$port/health" >/dev/null 2>&1; then
log "✅ $provider (port $port): healthy"
else
log "❌ $provider (port $port): unhealthy"
fi
done
}
case $ACTION in
"start")
start_service "${SERVICE:-all}"
;;
"stop")
stop_service "${SERVICE:-all}"
;;
"restart")
if [ -z "$SERVICE" ]; then
log "Restarting all services..."
stop_service "all"
sleep 5
start_service "all"
else
log "Restarting $SERVICE service..."
stop_service "$SERVICE"
sleep 2
start_service "$SERVICE"
fi
;;
"status")
show_status
;;
"logs")
if [ -z "$SERVICE" ]; then
cd "$BASE_DIR/orchestration"
docker-compose -f docker-compose.all.yml logs -f
elif [ "$SERVICE" == "gateway" ]; then
cd "$BASE_DIR/gateway"
docker-compose logs -f
elif [[ " ${PROVIDERS[*]} " =~ " $SERVICE " ]]; then
cd "$BASE_DIR/providers/$SERVICE"
docker-compose logs -f
else
log "Unknown service: $SERVICE"
exit 1
fi
;;
*)
echo "Usage: $0 {start|stop|restart|status|logs} [service_name]"
echo "Services: all, gateway, ${PROVIDERS[*]}"
echo ""
echo "Examples:"
echo " $0 start # Start all services"
echo " $0 start openrouter # Start OpenRouter service only"
echo " $0 stop phala # Stop Phala service only"
echo " $0 restart all # Restart all services"
echo " $0 status # Show service status"
echo " $0 logs chutesai # Show ChutesAI logs"
exit 1
;;
esac4. Monitoring and Health Checks
Health Check Script (scripts/health-check.sh)
#!/bin/bash
PROVIDERS=("openrouter" "phala" "chutesai" "nebula")
GATEWAY_PORT=4000
BASE_PORT=4000
check_service_health() {
local service=$1
local port=$2
local url="http://localhost:$port/health"
if curl -sf "$url" >/dev/null 2>&1; then
echo "✅ $service (port $port): healthy"
return 0
else
echo "❌ $service (port $port): unhealthy"
return 1
fi
}
main() {
echo "🏥 AIMO LLM Services Health Check"
echo "================================="
local unhealthy_count=0
# Check gateway
if ! check_service_health "gateway" $GATEWAY_PORT; then
((unhealthy_count++))
fi
# Check providers
for i in "${!PROVIDERS[@]}"; do
local provider="${PROVIDERS[$i]}"
local port=$((BASE_PORT + i + 1))
if ! check_service_health "$provider" $port; then
((unhealthy_count++))
fi
done
echo ""
if [ $unhealthy_count -eq 0 ]; then
echo "🎉 All services are healthy!"
exit 0
else
echo "⚠️ $unhealthy_count service(s) are unhealthy"
exit 1
fi
}
main "$@"5. Integration with Main Application
Update the main AIMO application configuration to use the new gateway:
Environment Variables (.env)
# LiteLLM Gateway Configuration
LLM_BASE_URL=http://localhost:4000 # Points to the gateway
LLM_API_KEY=sk-litellm-master-key
LLM_MODEL_DEFAULT=prod-default # Routes through OpenRouter by default
# Provider-specific configurations (optional)
OPENROUTER_ENDPOINT=http://localhost:4001
PHALA_ENDPOINT=http://localhost:4002
CHUTESAI_ENDPOINT=http://localhost:4003
NEBULA_ENDPOINT=http://localhost:4004🚀 Implementation Plan
Phase 1: Foundation Setup
- Create base file structure
- Implement shared network configuration
- Create service management scripts
- Set up monitoring infrastructure
Phase 2: Provider Separation
- Extract OpenRouter to separate service
- Add Phala Network integration
- Add ChutesAI integration
- Add Nebula Block integration
Phase 3: Gateway Implementation
- Implement Nginx-based API gateway
- Add intelligent routing logic
- Implement health check aggregation
- Add load balancing strategies
Phase 4: Orchestration
- Create full orchestration scripts
- Implement deployment automation
- Add rolling update capabilities
- Set up monitoring dashboards
Phase 5: Testing & Optimization
- Performance testing
- Load balancing optimization
- Fault tolerance testing
- Documentation updates
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels