This repository provides a Docker-based setup for running the Qwen3-Coder-30B-A3B-Instruct model with Claude Code integration.
This setup allows you to run a local LLM (Qwen3-Coder-30B-A3B-Instruct) using llama.cpp server and access it through Claude Code via litellm proxy. The environment is containerized and can be run completely offline after initial setup.
- Local Execution: Run code generation models entirely locally without internet access after initial download
- Claude Code API compatible through litellm proxy
- Think, Ultrathing etc. reasoning_effort is currently dropped as non-thinking model is currently used. To use a Qwen thinking/non-thinking model a litellm hook needs to be written to add
/think/nothinktags to requests. Or wait until litellm support Qwen API.
- Docker Engine (version 20.10 or higher)
- At least 18GB of free disk space for the model
- For GPU acceleration: NVIDIA GPU with CUDA support and nvidia-docker2
- Internet connection for initial model download (subsequent runs can be offline)
Note: The current settings are targeted for a GPU with 24GB VRAM or more. The code was tested on an RTX 3090. If you have less VRAM available, you'll need to adjust the following settings in the Dockerfile:
LLAMA_ARG_N_GPU_LAYERSLLAMA_ARG_CTX_SIZELLAMA_ARG_N_PREDICT
./build.shFor network access:
./run.shFor offline mode (no internet required):
./run_offline.shAfter running, you can test the setup with:
- Claude API compatibility:
./test_anthropic.sh - OpenAI API compatibility:
./test_openai.sh
The Dockerfile sets various environment variables for optimal performance:
- Model parameters (context size, prediction count, GPU layers)
- Sampling parameters (temperature, top-p, top-k, presence penalty)
- API endpoint configuration
- The first run will download the model from Hugging Face (requires internet connection)
- Subsequent runs will use the cached model
- The container will automatically shut down when you exit the terminal
If you encounter issues:
- Ensure Docker is installed and running
- Check that you have sufficient disk space for the model (~18GB)
- Verify network connectivity during initial model download
- Make sure you're running with appropriate permissions
- If you get connection error 500, verify if llama-server is still running by using
./view_llama_server.shscript. Sometimes it crashes upon tool calling.