Skip to content
This repository was archived by the owner on Mar 14, 2026. It is now read-only.
This repository was archived by the owner on Mar 14, 2026. It is now read-only.

[Feature Request] Implement Multi-Provider LLM Microservices Architecture with Docker Orchestration #36

@Wes1eyyy

Description

@Wes1eyyy

Multi-Provider LLM Microservices Architecture

Issue Labels

  • enhancement
  • architecture
  • docker
  • microservices
  • litellm

📋 Summary

Currently, the AIMO-Models project uses a single LiteLLM proxy service that aggregates multiple LLM providers through OpenRouter. We propose implementing a microservices architecture where each LLM provider runs in its own isolated Docker container, managed through a unified API gateway.

🎯 Motivation

Current Limitations

  • Single Point of Failure: All providers depend on one LiteLLM instance
  • Resource Contention: All models share the same container resources
  • Difficult Scaling: Cannot independently scale specific providers
  • Maintenance Complexity: Updates affect all providers simultaneously
  • Limited Isolation: Provider failures can impact other services

Proposed Benefits

  • Fault Isolation: Each provider runs independently
  • Independent Scaling: Scale providers based on demand
  • Easier Maintenance: Update/restart individual services
  • Better Resource Management: Allocate resources per provider
  • Enhanced Monitoring: Per-provider metrics and logging
  • Dynamic Provider Management: Add/remove providers without downtime

🏗️ Proposed Architecture

┌─────────────────────────────────────────────────────────────┐
│                    API Gateway / Load Balancer              │
│                   (Main AIMO Service)                       │
└─────────────────┬───────────────┬───────────────┬───────────┘
                  │               │               │
         ┌────────▼──────┐ ┌─────▼─────┐ ┌──────▼──────┐
         │  OpenRouter   │ │   Phala   │ │  ChutesAI   │
         │   Service     │ │  Service  │ │   Service   │
         │ (Port 4001)   │ │(Port 4002)│ │ (Port 4003) │
         └───────────────┘ └───────────┘ └─────────────┘
                  │               │               │
         ┌────────▼──────┐ ┌─────▼─────┐ ┌──────▼──────┐
         │  Nebula Block │ │ Provider  │ │  Provider   │
         │   Service     │ │ Service N │ │ Service N+1 │
         │ (Port 4004)   │ │(Port 400N)│ │(Port 400N+1)│
         └───────────────┘ └───────────┘ └─────────────┘

📁 Proposed File Structure

infra/
├── litellm/
│   ├── common/
│   │   ├── docker-compose.base.yml          # Base configuration
│   │   └── shared-network.yml               # Network definitions
│   ├── providers/
│   │   ├── openrouter/
│   │   │   ├── docker-compose.yml
│   │   │   ├── config.yaml
│   │   │   └── .env.openrouter
│   │   ├── phala/
│   │   │   ├── docker-compose.yml
│   │   │   ├── config.yaml
│   │   │   └── .env.phala
│   │   ├── chutesai/
│   │   │   ├── docker-compose.yml
│   │   │   ├── config.yaml
│   │   │   └── .env.chutesai
│   │   └── nebula/
│   │       ├── docker-compose.yml
│   │       ├── config.yaml
│   │       └── .env.nebula
│   ├── gateway/
│   │   ├── docker-compose.yml
│   │   ├── nginx.conf                       # Load balancer config
│   │   └── .env.gateway
│   ├── orchestration/
│   │   ├── docker-compose.all.yml           # Full orchestration
│   │   └── .env.all                         # Global environment
│   ├── monitoring/
│   │   ├── docker-compose.yml               # Prometheus & Grafana
│   │   └── prometheus.yml
│   └── scripts/
│       ├── manage-services.sh               # Service management
│       ├── health-check.sh                  # Health monitoring
│       └── deploy.sh                        # Deployment automation

🔧 Implementation Details

1. Individual Provider Services

Each provider will have its own Docker service configuration:

Example: OpenRouter Service (providers/openrouter/docker-compose.yml)

version: '3.8'
services:
  openrouter-llm:
    image: ghcr.io/berriai/litellm:main-latest
    container_name: aimo-openrouter-llm
    ports:
      - "4001:4000"
    volumes:
      - ./config.yaml:/app/config.yaml
    env_file:
      - .env.openrouter
    command: ["--config", "/app/config.yaml", "--port", "4000"]
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:4000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    networks:
      - aimo-llm-network
    deploy:
      resources:
        limits:
          memory: 1G
          cpus: "0.5"

networks:
  aimo-llm-network:
    external: true

2. API Gateway Configuration

Nginx Load Balancer (gateway/nginx.conf)

upstream openrouter_backend {
    server aimo-openrouter-llm:4000 weight=3 max_fails=2 fail_timeout=30s;
}

upstream phala_backend {
    server aimo-phala-llm:4000 weight=2 max_fails=2 fail_timeout=30s;
}

upstream chutesai_backend {
    server aimo-chutesai-llm:4000 weight=2 max_fails=2 fail_timeout=30s;
}

upstream nebula_backend {
    server aimo-nebula-llm:4000 weight=1 max_fails=2 fail_timeout=30s;
}

# Health check endpoint
server {
    listen 4000;
    
    # Provider-specific routing
    location /providers/openrouter/ {
        proxy_pass http://openrouter_backend/;
        include proxy_params;
    }
    
    location /providers/phala/ {
        proxy_pass http://phala_backend/;
        include proxy_params;
    }
    
    location /providers/chutesai/ {
        proxy_pass http://chutesai_backend/;
        include proxy_params;
    }
    
    location /providers/nebula/ {
        proxy_pass http://nebula_backend/;
        include proxy_params;
    }
    
    # Intelligent routing based on model name
    location /v1/chat/completions {
        # Route based on model parameter
        set $backend openrouter_backend;
        
        if ($request_body ~ "phala-") {
            set $backend phala_backend;
        }
        if ($request_body ~ "chutes-") {
            set $backend chutesai_backend;
        }
        if ($request_body ~ "nebula-") {
            set $backend nebula_backend;
        }
        
        proxy_pass http://$backend;
        include proxy_params;
    }
    
    # Health check aggregation
    location /health {
        access_log off;
        return 200 '{"status":"healthy","timestamp":"$time_iso8601"}';
        add_header Content-Type application/json;
    }
}

3. Service Management Scripts

Service Management (scripts/manage-services.sh)

#!/bin/bash

ACTION=$1
SERVICE=$2

PROVIDERS=("openrouter" "phala" "chutesai" "nebula")
BASE_DIR="$(dirname "$0")/.."

log() {
    echo "[$(date +'%Y-%m-%d %H:%M:%S')] $1"
}

create_network() {
    if ! docker network ls | grep -q aimo-llm-network; then
        log "Creating shared network..."
        docker network create aimo-llm-network
    fi
}

start_service() {
    local service=$1
    if [ "$service" == "all" ]; then
        create_network
        log "Starting all services..."
        cd "$BASE_DIR/orchestration"
        docker-compose -f docker-compose.all.yml up -d
    elif [ "$service" == "gateway" ]; then
        log "Starting gateway service..."
        cd "$BASE_DIR/gateway"
        docker-compose up -d
    elif [[ " ${PROVIDERS[*]} " =~ " $service " ]]; then
        create_network
        log "Starting $service service..."
        cd "$BASE_DIR/providers/$service"
        docker-compose up -d
    else
        log "Unknown service: $service"
        exit 1
    fi
}

stop_service() {
    local service=$1
    if [ "$service" == "all" ]; then
        log "Stopping all services..."
        cd "$BASE_DIR/orchestration"
        docker-compose -f docker-compose.all.yml down
    elif [ "$service" == "gateway" ]; then
        log "Stopping gateway service..."
        cd "$BASE_DIR/gateway"
        docker-compose down
    elif [[ " ${PROVIDERS[*]} " =~ " $service " ]]; then
        log "Stopping $service service..."
        cd "$BASE_DIR/providers/$service"
        docker-compose down
    else
        log "Unknown service: $service"
        exit 1
    fi
}

show_status() {
    log "Service Status:"
    docker ps --filter "name=aimo-*-llm" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
    
    log "\nHealth Status:"
    for provider in "${PROVIDERS[@]}"; do
        port=$((4000 + $(printf '%d\n' $(for i in "${!PROVIDERS[@]}"; do [ "${PROVIDERS[$i]}" = "$provider" ] && echo "$((i+1))"; done))))
        if curl -sf "http://localhost:$port/health" >/dev/null 2>&1; then
            log "$provider (port $port): healthy"
        else
            log "$provider (port $port): unhealthy"
        fi
    done
}

case $ACTION in
    "start")
        start_service "${SERVICE:-all}"
        ;;
    "stop")
        stop_service "${SERVICE:-all}"
        ;;
    "restart")
        if [ -z "$SERVICE" ]; then
            log "Restarting all services..."
            stop_service "all"
            sleep 5
            start_service "all"
        else
            log "Restarting $SERVICE service..."
            stop_service "$SERVICE"
            sleep 2
            start_service "$SERVICE"
        fi
        ;;
    "status")
        show_status
        ;;
    "logs")
        if [ -z "$SERVICE" ]; then
            cd "$BASE_DIR/orchestration"
            docker-compose -f docker-compose.all.yml logs -f
        elif [ "$SERVICE" == "gateway" ]; then
            cd "$BASE_DIR/gateway"
            docker-compose logs -f
        elif [[ " ${PROVIDERS[*]} " =~ " $SERVICE " ]]; then
            cd "$BASE_DIR/providers/$SERVICE"
            docker-compose logs -f
        else
            log "Unknown service: $SERVICE"
            exit 1
        fi
        ;;
    *)
        echo "Usage: $0 {start|stop|restart|status|logs} [service_name]"
        echo "Services: all, gateway, ${PROVIDERS[*]}"
        echo ""
        echo "Examples:"
        echo "  $0 start                    # Start all services"
        echo "  $0 start openrouter         # Start OpenRouter service only"
        echo "  $0 stop phala               # Stop Phala service only"
        echo "  $0 restart all              # Restart all services"
        echo "  $0 status                   # Show service status"
        echo "  $0 logs chutesai            # Show ChutesAI logs"
        exit 1
        ;;
esac

4. Monitoring and Health Checks

Health Check Script (scripts/health-check.sh)

#!/bin/bash

PROVIDERS=("openrouter" "phala" "chutesai" "nebula")
GATEWAY_PORT=4000
BASE_PORT=4000

check_service_health() {
    local service=$1
    local port=$2
    local url="http://localhost:$port/health"
    
    if curl -sf "$url" >/dev/null 2>&1; then
        echo "$service (port $port): healthy"
        return 0
    else
        echo "$service (port $port): unhealthy"
        return 1
    fi
}

main() {
    echo "🏥 AIMO LLM Services Health Check"
    echo "================================="
    
    local unhealthy_count=0
    
    # Check gateway
    if ! check_service_health "gateway" $GATEWAY_PORT; then
        ((unhealthy_count++))
    fi
    
    # Check providers
    for i in "${!PROVIDERS[@]}"; do
        local provider="${PROVIDERS[$i]}"
        local port=$((BASE_PORT + i + 1))
        
        if ! check_service_health "$provider" $port; then
            ((unhealthy_count++))
        fi
    done
    
    echo ""
    if [ $unhealthy_count -eq 0 ]; then
        echo "🎉 All services are healthy!"
        exit 0
    else
        echo "⚠️  $unhealthy_count service(s) are unhealthy"
        exit 1
    fi
}

main "$@"

5. Integration with Main Application

Update the main AIMO application configuration to use the new gateway:

Environment Variables (.env)

# LiteLLM Gateway Configuration
LLM_BASE_URL=http://localhost:4000  # Points to the gateway
LLM_API_KEY=sk-litellm-master-key
LLM_MODEL_DEFAULT=prod-default      # Routes through OpenRouter by default

# Provider-specific configurations (optional)
OPENROUTER_ENDPOINT=http://localhost:4001
PHALA_ENDPOINT=http://localhost:4002
CHUTESAI_ENDPOINT=http://localhost:4003
NEBULA_ENDPOINT=http://localhost:4004

🚀 Implementation Plan

Phase 1: Foundation Setup

  • Create base file structure
  • Implement shared network configuration
  • Create service management scripts
  • Set up monitoring infrastructure

Phase 2: Provider Separation

  • Extract OpenRouter to separate service
  • Add Phala Network integration
  • Add ChutesAI integration
  • Add Nebula Block integration

Phase 3: Gateway Implementation

  • Implement Nginx-based API gateway
  • Add intelligent routing logic
  • Implement health check aggregation
  • Add load balancing strategies

Phase 4: Orchestration

  • Create full orchestration scripts
  • Implement deployment automation
  • Add rolling update capabilities
  • Set up monitoring dashboards

Phase 5: Testing & Optimization

  • Performance testing
  • Load balancing optimization
  • Fault tolerance testing
  • Documentation updates

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions