A proof-of-concept AI music teacher that combines LLM chat interaction with audio analysis and MIDI generation for personalized music instruction.
π§ Early development - POC implementation - Vibe-coding involved - Not to be used as-is
- Audio Input: User recordings analyzed via signal processing tools
- LLM: Qwen2.5/Qwen3 models with excellent function calling support (or Llama 3.3 70B)
- Output: Text feedback + MIDI-generated musical examples with notation
- Architecture: FastAPI backend + TypeScript frontend
- Backends: Supports both llama-cpp-python (GGUF) and HuggingFace Transformers (safetensors)
Zikos tries to support a wide range of hardware configurations:
- CPU-only: Works without GPU (very slow, but functional)
- Small GPU (8GB VRAM): RTX 3060Ti, RTX 3070, etc. - Qwen2.5-7B recommended
- Medium GPU (16-24GB VRAM): RTX 3090, RTX 4090, etc. - Qwen2.5-14B or Llama 3.3 70B
- Large GPU (80GB+ VRAM): H100, A100, etc. - Qwen3-32B with 128K context window
- Python 3.11+
- FFmpeg (for audio preprocessing)
- LLM model file (GGUF or HuggingFace Transformers format) - see Downloading Models below
- GPU recommended (8GB+ VRAM) but CPU-only is supported
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install .
# Install JavaScript dependencies (for TypeScript frontend)
npm install
npm run build # Build TypeScript to JavaScript
# Set environment variables
# Copy .env.example to .env and edit with your settings
cp .env.example .env # On Windows: copy .env.example .env
# Edit .env with your settings (especially LLM_MODEL_PATH)Zikos can be configured via environment variables. Copy .env.example to .env and adjust values for your setup.
You can download models using the provided helper script. See MODEL_RECOMMENDATIONS.md for detailed recommendations.
# List available models
python scripts/download_model.py --list
python scripts/download_model.py qwen2.5-7b-instruct-q4 -o ./models
# With Hugging Face token (for private models)
python scripts/download_model.py qwen3-32b-instruct -t YOUR_TOKENThe script supports both GGUF (llama-cpp-python) and Transformers (HuggingFace) formats. After downloading, the .env file created by the setup script will be configured automatically, or you can set LLM_MODEL_PATH manually:
# For GGUF models
export LLM_MODEL_PATH=./models/Qwen2.5-7B-Instruct-Q4_K_M.ggufNote: The script requires huggingface_hub for Transformers models. Install with:
# Recommended: install model download helpers
pip install -e ".[model-download]"
# Or install individually
pip install huggingface_hubpython run.pyAPI will be available at http://localhost:8000
Zikos can be run using Docker, which handles all dependencies and setup automatically.
- Docker and Docker Compose installed
- LLM model file downloaded to
./models/directory (see Downloading Models)
The easiest way to run Zikos with Docker:
# Set the model filename (optional, defaults to Llama-3.1-8B-Instruct-Q4_K_M.gguf)
export LLM_MODEL_FILE=Qwen2.5-7B-Instruct-Q4_K_M.gguf
# Build and start the container
docker-compose up --build
# Or run in detached mode
docker-compose up -d --buildThe API will be available at http://localhost:8000. The container automatically:
- Builds the frontend TypeScript code
- Mounts your
./modelsdirectory (read-only) for model access - Creates and mounts storage directories for audio, MIDI, and notation files
- Sets up environment variables with sensible defaults
# Build the image
docker build -t zikos .
# Run the container
docker run -d \
--name zikos \
-p 8000:8000 \
-v ./models:/app/models:ro \
-v ./audio_storage:/app/audio_storage \
-v ./midi_storage:/app/midi_storage \
-v ./notation_storage:/app/notation_storage \
-e LLM_MODEL_PATH=/app/models/Qwen2.5-7B-Instruct-Q4_K_M.gguf \
-e LLM_N_CTX=32768 \
-e LLM_N_GPU_LAYERS=0 \
zikosThe Docker setup uses volumes to persist data:
./modelsβ/app/models(read-only): Model files./audio_storageβ/app/audio_storage: Uploaded audio files./midi_storageβ/app/midi_storage: Generated MIDI files./notation_storageβ/app/notation_storage: Generated notation files
Environment variables can be customized in docker-compose.yml or passed via -e flags when using docker run. See Environment Variables for available options.
Note: For GPU support, you'll need to configure Docker with GPU access (e.g., --gpus all flag or Docker Compose GPU configuration) and adjust LLM_N_GPU_LAYERS accordingly.
- LLM: Qwen2.5-7B/14B (recommended), Qwen3-32B (for H100) or similar models, via dual backend support
- llama-cpp-python: For GGUF models (Qwen2.5, Llama 3.3)
- HuggingFace Transformers: For safetensors models (Qwen3)
- Audio Processing: librosa, torchaudio, soundfile
- MIDI: Music21 for processing, FluidSynth for synthesis
- Backend: FastAPI with WebSocket support
- Frontend: TypeScript + Web Audio API
The project uses:
- ruff: Fast Python linter
- black: Code formatter
- mypy: Static type checker
- pytest: Testing framework with coverage
zikos/
βββ backend/
β βββ zikos/ # Python backend code
β βββ api/ # FastAPI routes
β βββ mcp/ # MCP tools and server
β βββ services/ # Business logic
β βββ config.py # Configuration
β βββ main.py # FastAPI app
βββ frontend/ # TypeScript/HTML frontend
β βββ src/ # TypeScript source files
β βββ dist/ # Compiled JavaScript (generated)
β βββ index.html # Main HTML file
βββ tests/ # Test code
βββ scripts/ # Utility scripts (model download, env setup)
βββ CONFIGURATION.md # Hardware-specific configuration guide (includes H100 optimization)
βββ MODEL_RECOMMENDATIONS.md # Model recommendations
βββ DESIGN.md # Architecture design and future roadmap
βββ TOOLS.md # MCP tools specification
βββ SYSTEM_PROMPT.md # LLM system prompt
pip install .[dev]
# Optional: generate a pinned requirements.txt for reproducible builds
pip-compile pyproject.toml -o requirements.txt# Install pre-commit hooks (runs checks before commit)
pre-commit install
# Run hooks manually
pre-commit run --all-filesThis project follows Test-Driven Development (TDD) principles with comprehensive test coverage.
- Unit tests: Test individual components in isolation
- Integration tests: Test API endpoints and service interactions
- Coverage target: Minimum 80% code coverage
Note: Comprehensive tests (LLM inference, heavy audio processing) and integration tests are excluded from default pytest runs and pre-commit hooks to keep commit times reasonable. These tests are marked as comprehensive or integration and require model files or significant resources. LLM integration tests verify real tool calling functionality. These are critical for catching bugs that mocked tests miss.
Run tests:
pytest -m not comprehensive # Run all but comprehensive tests
pytest -m integration # Run integration tests
pytest -m "" # Run all tests including comprehensive and integrationThe project uses GitHub Actions for CI/CD. The workflow (.github/workflows/ci.yml) runs automatically on pushes and pull requests to main and develop branches.
-
Test (Python 3.11, 3.12, 3.13)
- Runs unit tests with coverage (minimum 75% required)
- Runs integration tests (excluding comprehensive tests)
- Uploads coverage to Codecov (Python 3.13 only)
- Installs system dependencies (libsndfile, ffmpeg, fluidsynth, etc.)
-
Lint
- Runs
rufffor linting - Runs
black --checkfor code formatting - Runs
mypyfor type checking
- Runs
-
TypeScript Type Check
- Runs TypeScript type checking
- Runs ESLint for frontend code quality
-
Frontend Tests
- Runs frontend test suite with coverage
- Uploads coverage to Codecov
All jobs must pass for a PR to be mergeable. The CI ensures code quality, type safety, and test coverage across multiple Python versions and the frontend.