Skip to content
This repository was archived by the owner on Mar 14, 2026. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -327,4 +327,7 @@ data/prompts/*
# VS Code
.vscode/

.env.litellm
# Environment variables files
.env.nebulablock
.env.chutesai
.env.openrouter
5 changes: 0 additions & 5 deletions infra/litellm/.env.litellm.example

This file was deleted.

36 changes: 36 additions & 0 deletions infra/litellm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# AIMO Multi-Provider LLM Services Manager

This script provides unified management for all LLM provider services and shared infrastructure (database, monitoring) in the AIMO project.

## Usage

```bash
./manage-all-services.sh {start|stop|restart|status|logs|test} [service]
```

- `start [service]` Start all, shared, or a specific provider service
- `stop [service]` Stop all, shared, or a specific provider service
- `restart [service]` Restart all, shared, or a specific provider service
- `status` Show status and health of all services
- `logs [service]` Show logs for all, shared, or a specific provider
- `test` Test health and endpoints of all services

## Examples

```bash
./manage-all-services.sh start # Start all services
./manage-all-services.sh start shared # Start only shared infrastructure
./manage-all-services.sh start openrouter # Start only OpenRouter service
./manage-all-services.sh stop all # Stop all services
./manage-all-services.sh status # Show service status
./manage-all-services.sh logs nebulablock # Show Nebula Block logs
./manage-all-services.sh test # Test all services
```

## Notes

- Services managed: openrouter, nebulablock, chutesai
- Shared infrastructure includes database, Redis, Prometheus, Grafana
- Requires Docker and docker-compose installed

---
13 changes: 13 additions & 0 deletions infra/litellm/chutesai/.env.chutesai.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# LiteLLM Proxy Configuration
LITELLM_MASTER_KEY=sk-chutesai-proxy-key

# Model Provider API Keys (used by LiteLLM)
CHUTESAI_API_KEY=your_chutesai_api_key_here

# Database (Shared PostgreSQL with schema separation)
LITELLM_DATABASE_URL=postgresql://litellm:litellm123@aimo-shared-db:5432/litellm
LITELLM_TABLE_PREFIX=chutesai_

# Service Configuration
SERVICE_NAME=chutesai-llm-proxy
LOG_LEVEL=INFO
250 changes: 250 additions & 0 deletions infra/litellm/chutesai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
# ChutesAI LiteLLM Service

This directory contains the Docker configuration for running LiteLLM proxy with ChutesAI provider integration.

## Files Structure

```
chutesai/
├── docker-compose.chutesai.yml # Docker Compose configuration
├── chutesai_config.yaml # LiteLLM model configuration
├── .env.chutesai # Environment variables (create from example)
├── .env.chutesai.example # Environment variables template
├── README.md # This file
├── textModelsList.txt # Complete model list with pricing
└── data/ # Persistent data directory
```

## Available Models

### Free Models (0.0 pricing)
- **GLM Models**: glm-4.5-air-free (Free, 131K context)
- **OpenAI OSS**: openai-gpt-oss-20b-free (Free, 131K context)
- **Google Gemma**: gemma-3-4b-it-free (Free, 96K context)
- **LongCat Models**: longcat-flash-chat-fp8-free, longcat-flash-thinking-fp8-free (Free, 131K context)
- **Alibaba**: tongyi-deepresearch-30b-free (Free, 131K context)

### Budget Models ($0.01-$0.07 per 1M tokens)
- **Meta Llama**: llama-3.2-1b-instruct ($0.01/$0.01), llama-3.2-3b-instruct ($0.01/$0.01)
- **Google Gemma**: gemma-2-9b-it ($0.01/$0.02), gemmasutra-pro-27b ($0.01/$0.03)
- **NousResearch**: hermes-4-14b ($0.01/$0.05), deephermes-3-llama-3-8b ($0.01/$0.05)
- **DeepSeek**: deepseek-r1-0528-qwen3-8b ($0.01/$0.05)
- **Mistral**: mistral-nemo-instruct ($0.02/$0.07)
- **Moonshot**: kimi-dev-72b ($0.07/$0.26), kimi-vl-a3b-thinking ($0.02/$0.07)

### Mid-range Models ($0.04-$0.29 per 1M tokens)
- **Google Gemma**: gemma-3-12b-it ($0.04/$0.14)
- **Qwen**: qwen3-30b-a3b-thinking ($0.08/$0.29)
- **GLM**: glm-4.5v ($0.08/$0.33)
- **Tencent**: hunyuan-a13b-instruct ($0.04/$0.14)
- **NVIDIA**: llama-3.3-nemotron-super-49b ($0.07/$0.26)

### Premium Models ($0.14-$3.0 per 1M tokens)
- **ChutesAI Mistral**: mistral-small-3.2-24b ($0.14/$0.57)
- **Qwen Advanced**: qwen3-next-80b-a3b-thinking ($0.1/$0.8), qwen3-vl-235b-a22b-thinking ($0.16/$0.65)
- **DeepSeek Premium**: deepseek-v3.1-turbo ($1.0/$3.0), deepseek-r1-0528 ($0.55/$1.75)
- **ByteDance**: seed-oss-36b-instruct ($0.16/$0.65)

### Ultra Premium Models ($0.25-$1.0+ per 1M tokens)
- **DeepSeek Flagship**: deepseek-r1, deepseek-v3, deepseek-v3.1 ($0.25/$1.0)
- **NousResearch**: hermes-4-405b-fp8 ($0.25/$1.0)

## Setup Instructions

### 1. Configure Environment Variables

Copy and edit the environment file:
```bash
cp .env.chutesai.example .env.chutesai
```

Edit `.env.chutesai` and add your ChutesAI API key:
```bash
# Update this with your actual API key
CHUTESAI_API_KEY=your_actual_chutesai_api_key_here
```

### 2. Start the Service

```bash
# Start ChutesAI LLM service
docker-compose -f docker-compose.chutesai.yml up -d

# Check service status
docker-compose -f docker-compose.chutesai.yml ps

# View logs
docker-compose -f docker-compose.chutesai.yml logs -f
```

### 3. Test the Service

```bash
# Health check
curl http://localhost:4004/health

# List available models
curl http://localhost:4004/v1/models

# Test chat completion with a free model
curl -X POST http://localhost:4004/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-chutesai-proxy-key" \
-d '{
"model": "glm-4_5-air-free",
"messages": [{"role": "user", "content": "Hello! Can you help me with coding?"}],
"max_tokens": 100
}'

# Test with a premium model
curl -X POST http://localhost:4004/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-chutesai-proxy-key" \
-d '{
"model": "deepseek-r1",
"messages": [{"role": "user", "content": "Explain quantum computing in simple terms"}],
"max_tokens": 150
}'
```

### 4. Stop the Service

```bash
# Stop the service
docker-compose -f docker-compose.chutesai.yml down

# Stop and remove volumes (caution: this deletes database data)
docker-compose -f docker-compose.chutesai.yml down -v
```

## Configuration Details

### Model Naming Convention
- Models are prefixed with `chutesai/` to identify the provider
- Free models are explicitly marked with "-free" suffix
- Pricing information is included for cost tracking and routing decisions

### Network Configuration
- Service runs on port 4004 to avoid conflicts with other LLM services
- Uses shared `aimo-llm-network` for integration with other services
- Shared PostgreSQL database with `chutesai` schema for isolation

### Fallback Strategy
- Free models (glm-4.5-air-free, openai-gpt-oss-20b-free, etc.) are configured as fallbacks
- Routing strategy set to "least-busy" for optimal load distribution
- Request timeout set to 10 minutes for complex reasoning queries

## Integration with Main AIMO Service

To use this service in your main AIMO application, configure:

```bash
# Add to main .env file
LLM_BASE_URL=http://localhost:4004
LLM_API_KEY=sk-chutesai-proxy-key
LLM_MODEL_DEFAULT=glm-4_5-air-free # Use free model as default
```

## Model Categories and Use Cases

### Free Tier (Perfect for Development)
- **General Chat**: glm-4.5-air-free, openai-gpt-oss-20b-free
- **Code Generation**: gemma-3-4b-it-free
- **Long Context**: longcat-flash-chat-fp8-free (131K tokens)
- **Research**: tongyi-deepresearch-30b-free

### Production Ready (Cost-Effective)
- **Balanced Performance**: hermes-4-14b, deephermes-3-llama-3-8b
- **Reasoning Tasks**: deepseek-r1-0528-qwen3-8b
- **Multimodal**: kimi-vl-a3b-thinking
- **Code Assistant**: deepcoder-14b-preview

### Enterprise Grade (High Performance)
- **Advanced Reasoning**: deepseek-r1, deepseek-v3.1
- **Large Context**: qwen3-vl-235b-a22b-thinking (262K context)
- **Specialized Tasks**: mistral-small-3.2-24b
- **Vision Models**: glm-4.5v

### Ultra Premium (Cutting Edge)
- **Best Reasoning**: deepseek-v3.1-turbo
- **Largest Models**: hermes-4-405b-fp8
- **Advanced Multimodal**: qwen3-vl-235b-a22b-thinking

## Monitoring and Maintenance

### Health Monitoring
- Health check endpoint: `http://localhost:4004/health`
- Database connectivity included in health checks
- Automatic container restart on failure

### Logs and Analytics
- JSON formatted logs for structured analysis
- Database logging for request analytics and cost tracking
- Schema-based data separation from other providers

### Resource Management
- Single worker process optimized for development
- Configurable timeout and rate limiting
- Automatic parameter validation and cleanup
- Memory-efficient model loading

## Troubleshooting

### Common Issues
1. **Port 4004 already in use**: Change the port in docker-compose.yml
2. **API key invalid**: Verify CHUTESAI_API_KEY in .env.chutesai
3. **Models not loading**: Check chutesai_config.yaml syntax
4. **Database connection issues**: Ensure shared PostgreSQL container is healthy

### Debug Mode
Enable debug logging by setting in .env.chutesai:
```bash
LOG_LEVEL=DEBUG
```

### Performance Tuning
For production use, consider:
- Increasing `num_workers` in docker-compose.yml
- Adjusting rate limits in configuration
- Setting up external PostgreSQL database
- Adding Redis for caching
- Using load balancer for high availability

### Cost Management
- Use free models for development and testing
- Set up model fallbacks to prevent overspending
- Monitor usage through database logs
- Consider budget models for production workloads

## Security Considerations

- Change default master key in production
- Use strong database passwords
- Implement network-level access controls
- Regular API key rotation
- Monitor usage for anomalies
- Set up rate limiting per user/API key

## API Compatibility

ChutesAI service is fully compatible with OpenAI API format:
- `/v1/chat/completions` - Chat completions
- `/v1/models` - List available models
- `/health` - Service health check
- Standard OpenAI headers and request/response format

## Cost Optimization Tips

1. **Start with Free Models**: Use glm-4.5-air-free, openai-gpt-oss-20b-free for development
2. **Fallback Strategy**: Configure fallbacks from premium to free models
3. **Right-size Models**: Use smaller models for simple tasks
4. **Monitor Usage**: Track costs through database logging
5. **Batch Requests**: Group multiple requests when possible

## Support and Documentation

For issues specific to ChutesAI integration:
1. Check service logs: `docker-compose -f docker-compose.chutesai.yml logs`
2. Verify API connectivity: `curl http://localhost:4004/health`
3. Test model availability: `curl http://localhost:4004/v1/models`
4. Check database schema: Ensure `chutesai` schema exists
Loading
Loading