bitrouter · takasaki404 · Oct 6, 2025 · Sep 27, 2025 · Sep 29, 2025 · Sep 29, 2025
diff --git a/.gitignore b/.gitignore
@@ -327,4 +327,7 @@ data/prompts/*
 # VS Code
 .vscode/
 
-.env.litellm
+# Environment variables files
+.env.nebulablock
+.env.chutesai
+.env.openrouter
diff --git a/infra/litellm/.env.litellm.example b/infra/litellm/.env.litellm.example
diff --git a/infra/litellm/README.md b/infra/litellm/README.md
@@ -0,0 +1,36 @@
+# AIMO Multi-Provider LLM Services Manager
+
+This script provides unified management for all LLM provider services and shared infrastructure (database, monitoring) in the AIMO project.
+
+## Usage
+
+```bash
+./manage-all-services.sh {start|stop|restart|status|logs|test} [service]
+```
+
+- `start [service]`    Start all, shared, or a specific provider service
+- `stop [service]`     Stop all, shared, or a specific provider service
+- `restart [service]`  Restart all, shared, or a specific provider service
+- `status`             Show status and health of all services
+- `logs [service]`     Show logs for all, shared, or a specific provider
+- `test`               Test health and endpoints of all services
+
+## Examples
+
+```bash
+./manage-all-services.sh start           # Start all services
+./manage-all-services.sh start shared    # Start only shared infrastructure
+./manage-all-services.sh start openrouter # Start only OpenRouter service
+./manage-all-services.sh stop all        # Stop all services
+./manage-all-services.sh status          # Show service status
+./manage-all-services.sh logs nebulablock # Show Nebula Block logs
+./manage-all-services.sh test            # Test all services
+```
+
+## Notes
+
+- Services managed: openrouter, nebulablock, chutesai
+- Shared infrastructure includes database, Redis, Prometheus, Grafana
+- Requires Docker and docker-compose installed
+
+---
diff --git a/infra/litellm/chutesai/.env.chutesai.example b/infra/litellm/chutesai/.env.chutesai.example
@@ -0,0 +1,13 @@
+# LiteLLM Proxy Configuration
+LITELLM_MASTER_KEY=sk-chutesai-proxy-key
+
+# Model Provider API Keys (used by LiteLLM)
+CHUTESAI_API_KEY=your_chutesai_api_key_here
+
+# Database (Shared PostgreSQL with schema separation)
+LITELLM_DATABASE_URL=postgresql://litellm:litellm123@aimo-shared-db:5432/litellm
+LITELLM_TABLE_PREFIX=chutesai_
+
+# Service Configuration
+SERVICE_NAME=chutesai-llm-proxy
+LOG_LEVEL=INFO
diff --git a/infra/litellm/chutesai/README.md b/infra/litellm/chutesai/README.md
@@ -0,0 +1,250 @@
+# ChutesAI LiteLLM Service
+
+This directory contains the Docker configuration for running LiteLLM proxy with ChutesAI provider integration.
+
+## Files Structure
+
+```
+chutesai/
+├── docker-compose.chutesai.yml        # Docker Compose configuration
+├── chutesai_config.yaml               # LiteLLM model configuration
+├── .env.chutesai                       # Environment variables (create from example)
+├── .env.chutesai.example               # Environment variables template
+├── README.md                           # This file
+├── textModelsList.txt                  # Complete model list with pricing
+└── data/                               # Persistent data directory
+```
+
+## Available Models
+
+### Free Models (0.0 pricing)
+- **GLM Models**: glm-4.5-air-free (Free, 131K context)
+- **OpenAI OSS**: openai-gpt-oss-20b-free (Free, 131K context)
+- **Google Gemma**: gemma-3-4b-it-free (Free, 96K context)
+- **LongCat Models**: longcat-flash-chat-fp8-free, longcat-flash-thinking-fp8-free (Free, 131K context)
+- **Alibaba**: tongyi-deepresearch-30b-free (Free, 131K context)
+
+### Budget Models ($0.01-$0.07 per 1M tokens)
+- **Meta Llama**: llama-3.2-1b-instruct ($0.01/$0.01), llama-3.2-3b-instruct ($0.01/$0.01)
+- **Google Gemma**: gemma-2-9b-it ($0.01/$0.02), gemmasutra-pro-27b ($0.01/$0.03)
+- **NousResearch**: hermes-4-14b ($0.01/$0.05), deephermes-3-llama-3-8b ($0.01/$0.05)
+- **DeepSeek**: deepseek-r1-0528-qwen3-8b ($0.01/$0.05)
+- **Mistral**: mistral-nemo-instruct ($0.02/$0.07)
+- **Moonshot**: kimi-dev-72b ($0.07/$0.26), kimi-vl-a3b-thinking ($0.02/$0.07)
+
+### Mid-range Models ($0.04-$0.29 per 1M tokens)
+- **Google Gemma**: gemma-3-12b-it ($0.04/$0.14)
+- **Qwen**: qwen3-30b-a3b-thinking ($0.08/$0.29)
+- **GLM**: glm-4.5v ($0.08/$0.33)
+- **Tencent**: hunyuan-a13b-instruct ($0.04/$0.14)
+- **NVIDIA**: llama-3.3-nemotron-super-49b ($0.07/$0.26)
+
+### Premium Models ($0.14-$3.0 per 1M tokens)
+- **ChutesAI Mistral**: mistral-small-3.2-24b ($0.14/$0.57)
+- **Qwen Advanced**: qwen3-next-80b-a3b-thinking ($0.1/$0.8), qwen3-vl-235b-a22b-thinking ($0.16/$0.65)
+- **DeepSeek Premium**: deepseek-v3.1-turbo ($1.0/$3.0), deepseek-r1-0528 ($0.55/$1.75)
+- **ByteDance**: seed-oss-36b-instruct ($0.16/$0.65)
+
+### Ultra Premium Models ($0.25-$1.0+ per 1M tokens)
+- **DeepSeek Flagship**: deepseek-r1, deepseek-v3, deepseek-v3.1 ($0.25/$1.0)
+- **NousResearch**: hermes-4-405b-fp8 ($0.25/$1.0)
+
+## Setup Instructions
+
+### 1. Configure Environment Variables
+
+Copy and edit the environment file:
+```bash
+cp .env.chutesai.example .env.chutesai
+```
+
+Edit `.env.chutesai` and add your ChutesAI API key:
+```bash
+# Update this with your actual API key
+CHUTESAI_API_KEY=your_actual_chutesai_api_key_here
+```
+
+### 2. Start the Service
+
+```bash
+# Start ChutesAI LLM service
+docker-compose -f docker-compose.chutesai.yml up -d
+
+# Check service status
+docker-compose -f docker-compose.chutesai.yml ps
+
+# View logs
+docker-compose -f docker-compose.chutesai.yml logs -f
+```
+
+### 3. Test the Service
+
+```bash
+# Health check
+curl http://localhost:4004/health
+
+# List available models
+curl http://localhost:4004/v1/models
+
+# Test chat completion with a free model
+curl -X POST http://localhost:4004/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-chutesai-proxy-key" \
+  -d '{
+    "model": "glm-4_5-air-free",
+    "messages": [{"role": "user", "content": "Hello! Can you help me with coding?"}],
+    "max_tokens": 100
+  }'
+
+# Test with a premium model
+curl -X POST http://localhost:4004/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-chutesai-proxy-key" \
+  -d '{
+    "model": "deepseek-r1",
+    "messages": [{"role": "user", "content": "Explain quantum computing in simple terms"}],
+    "max_tokens": 150
+  }'
+```
+
+### 4. Stop the Service
+
+```bash
+# Stop the service
+docker-compose -f docker-compose.chutesai.yml down
+
+# Stop and remove volumes (caution: this deletes database data)
+docker-compose -f docker-compose.chutesai.yml down -v
+```
+
+## Configuration Details
+
+### Model Naming Convention
+- Models are prefixed with `chutesai/` to identify the provider
+- Free models are explicitly marked with "-free" suffix
+- Pricing information is included for cost tracking and routing decisions
+
+### Network Configuration
+- Service runs on port 4004 to avoid conflicts with other LLM services
+- Uses shared `aimo-llm-network` for integration with other services
+- Shared PostgreSQL database with `chutesai` schema for isolation
+
+### Fallback Strategy
+- Free models (glm-4.5-air-free, openai-gpt-oss-20b-free, etc.) are configured as fallbacks
+- Routing strategy set to "least-busy" for optimal load distribution
+- Request timeout set to 10 minutes for complex reasoning queries
+
+## Integration with Main AIMO Service
+
+To use this service in your main AIMO application, configure:
+
+```bash
+# Add to main .env file
+LLM_BASE_URL=http://localhost:4004
+LLM_API_KEY=sk-chutesai-proxy-key
+LLM_MODEL_DEFAULT=glm-4_5-air-free  # Use free model as default
+```
+
+## Model Categories and Use Cases
+
+### Free Tier (Perfect for Development)
+- **General Chat**: glm-4.5-air-free, openai-gpt-oss-20b-free
+- **Code Generation**: gemma-3-4b-it-free
+- **Long Context**: longcat-flash-chat-fp8-free (131K tokens)
+- **Research**: tongyi-deepresearch-30b-free
+
+### Production Ready (Cost-Effective)
+- **Balanced Performance**: hermes-4-14b, deephermes-3-llama-3-8b
+- **Reasoning Tasks**: deepseek-r1-0528-qwen3-8b
+- **Multimodal**: kimi-vl-a3b-thinking
+- **Code Assistant**: deepcoder-14b-preview
+
+### Enterprise Grade (High Performance)
+- **Advanced Reasoning**: deepseek-r1, deepseek-v3.1
+- **Large Context**: qwen3-vl-235b-a22b-thinking (262K context)
+- **Specialized Tasks**: mistral-small-3.2-24b
+- **Vision Models**: glm-4.5v
+
+### Ultra Premium (Cutting Edge)
+- **Best Reasoning**: deepseek-v3.1-turbo
+- **Largest Models**: hermes-4-405b-fp8
+- **Advanced Multimodal**: qwen3-vl-235b-a22b-thinking
+
+## Monitoring and Maintenance
+
+### Health Monitoring
+- Health check endpoint: `http://localhost:4004/health`
+- Database connectivity included in health checks
+- Automatic container restart on failure
+
+### Logs and Analytics
+- JSON formatted logs for structured analysis
+- Database logging for request analytics and cost tracking
+- Schema-based data separation from other providers
+
+### Resource Management
+- Single worker process optimized for development
+- Configurable timeout and rate limiting
+- Automatic parameter validation and cleanup
+- Memory-efficient model loading
+
+## Troubleshooting
+
+### Common Issues
+1. **Port 4004 already in use**: Change the port in docker-compose.yml
+2. **API key invalid**: Verify CHUTESAI_API_KEY in .env.chutesai
+3. **Models not loading**: Check chutesai_config.yaml syntax
+4. **Database connection issues**: Ensure shared PostgreSQL container is healthy
+
+### Debug Mode
+Enable debug logging by setting in .env.chutesai:
+```bash
+LOG_LEVEL=DEBUG
+```
+
+### Performance Tuning
+For production use, consider:
+- Increasing `num_workers` in docker-compose.yml
+- Adjusting rate limits in configuration
+- Setting up external PostgreSQL database
+- Adding Redis for caching
+- Using load balancer for high availability
+
+### Cost Management
+- Use free models for development and testing
+- Set up model fallbacks to prevent overspending
+- Monitor usage through database logs
+- Consider budget models for production workloads
+
+## Security Considerations
+
+- Change default master key in production
+- Use strong database passwords
+- Implement network-level access controls
+- Regular API key rotation
+- Monitor usage for anomalies
+- Set up rate limiting per user/API key
+
+## API Compatibility
+
+ChutesAI service is fully compatible with OpenAI API format:
+- `/v1/chat/completions` - Chat completions
+- `/v1/models` - List available models
+- `/health` - Service health check
+- Standard OpenAI headers and request/response format
+
+## Cost Optimization Tips
+
+1. **Start with Free Models**: Use glm-4.5-air-free, openai-gpt-oss-20b-free for development
+2. **Fallback Strategy**: Configure fallbacks from premium to free models
+3. **Right-size Models**: Use smaller models for simple tasks
+4. **Monitor Usage**: Track costs through database logging
+5. **Batch Requests**: Group multiple requests when possible
+
+## Support and Documentation
+
+For issues specific to ChutesAI integration:
+1. Check service logs: `docker-compose -f docker-compose.chutesai.yml logs`
+2. Verify API connectivity: `curl http://localhost:4004/health`
+3. Test model availability: `curl http://localhost:4004/v1/models`
+4. Check database schema: Ensure `chutesai` schema exists