A blazingly fast CLI tool for AI-assisted coding using local Ollama models on your GPU.
New to ai-coder? 👉 Start with the SETUP.md guide - Complete step-by-step instructions
- 🚀 Fast Local Inference: Run large language models directly on your GPU with Ollama
- 🔐 Privacy-First: All processing happens locally—no data sent to external APIs
- ⚡ Streaming Output: Real-time streaming responses as they're generated
- 🔧 Configurable: Choose your model and Ollama instance with ease
- 💰 Free Forever: No API costs, subscriptions, or limits
-
Ollama installed and running locally
ollama serve
-
A coding model pulled (e.g., qwen2.5-coder, deepseek-coder-v2)
ollama pull qwen2.5-coder
-
Rust 1.70+ (for building from source)
Clone and build:
git clone https://github.com/lornu-ai/ai-coder.git
cd ai-coder
cargo build --releaseThe binary will be available at ./target/release/ai-coder.
./target/release/ai-coder "Write a fast Fibonacci sequence generator in Rust"./target/release/ai-coder --model deepseek-coder-v2 "Implement a binary search algorithm"# Via command-line flag
./target/release/ai-coder -H http://192.168.1.50:11434 "Your prompt here"
# Via environment variable
OLLAMA_HOST="http://192.168.1.50:11434" ./target/release/ai-coder "Your prompt here"Create a config file in your current directory:
model = "deepseek-coder-v2"
host = "http://localhost:11434"Then run normally:
./target/release/ai-coder "Refactor this Rust module"Or specify a custom config path:
./target/release/ai-coder --config ./configs/dev.toml "Your prompt here"./target/release/ai-coder --help- Language: Rust (async/await with Tokio)
- HTTP Client: Reqwest with streaming support
- CLI Framework: Clap for command-line argument parsing
- Ollama Integration: Local REST API calls to localhost:11434
- Takes your prompt as a CLI argument
- Connects to your local Ollama instance (default: http://localhost:11434)
- Sends a streaming request to the model
- Streams the output directly to your terminal in real-time
- Exits when generation is complete
Precedence is:
- Command-line flags
- Environment variables (
OLLAMA_HOST) - Config file (
.ai-coder.tomlor--configpath) - Built-in defaults
OLLAMA_HOST: Default Ollama instance URL (e.g.,http://localhost:11434)
-m, --model <MODEL>: Model name (default:qwen2.5-coder)-H, --host <HOST>: Ollama host URL (overridesOLLAMA_HOSTenv var)--config <PATH>: Optional config file path (default lookup:./.ai-coder.toml)
- GPU VRAM: Models typically require 6-14GB VRAM. Check your GPU capacity.
- Model Selection: Start with smaller models (7B) for faster iterations.
- Temperature: For coding, lower temperature values produce more deterministic output.
- Context Length: Larger context windows allow for more complex prompts.
- Agentic loop support (auto-execute generated code)
- Project file context integration
- Bash command execution
- Configuration file support
- Multi-turn conversation mode
- Code formatting and syntax highlighting
MIT
Contributions welcome! Please open an issue or PR.
For Ollama issues: https://github.com/ollama/ollama For ai-coder issues: https://github.com/lornu-ai/ai-coder/issues