It's a semantic code search engine that uses embeddings and vector databases to find code based on meaning, not keywords. Ask questions like "error handling" and find all error handling code, regardless of the exact variable names or syntax used.
- grep/ripgrep: Find exact text matches (fast but limited)
- semantic-code-agents-rs: Find by meaning using AI embeddings (powerful but requires computation)
Use semantic search when:
- You don't know the exact code name
- You want conceptually similar code
- You're exploring unfamiliar codebases
Use grep when:
- You know the exact identifier
- You need speed on every single search
- You're working with tiny codebases
Yes! The code follows production standards:
- Comprehensive error handling
- Type-safe design with Rust
- Extensive testing
- Hexagonal architecture for extensibility
- Multiple production-grade integrations (Milvus, OpenAI, etc.)
It depends on your configuration:
- Local (ONNX) + Local index: All processing local, completely private
- Cloud embeddings: Your code is sent to the embedding provider (OpenAI, Gemini, etc.)
For sensitive code, use local ONNX embeddings.
Typical ratios:
- Source code: 1 MB → 5-10 MB index (with embeddings)
- 10,000 files → 500 MB - 2 GB (depending on file size)
Example:
- A 100 MB codebase → ~500 MB to 1 GB index
Use semantic-code-agents status to check current index size.
No, not required. The default is ONNX (local embeddings, no key needed).
To use cloud embeddings, you'll need:
- OpenAI:
OPENAI_API_KEY - Gemini:
GEMINI_API_KEY - Voyage:
VOYAGE_API_KEY - Ollama: Local (no key)
For best results (in order):
- Voyage - Fast, accurate, designed for code
- OpenAI - General purpose, good for code
- Gemini - Good semantic understanding
- ONNX - Free, local, good enough
For speed:
- ONNX - Instant (local)
- Voyage - ~50ms
- OpenAI - ~100-200ms
- Gemini - ~200-300ms
For cost:
- ONNX - $0
- Ollama - $0
- Voyage - $$$$ (cheapest cloud)
- OpenAI - $$$
Choose ONNX unless you need higher quality.
- Add
.context/config.tomlto your project root - Add
.contextignore(like.gitignore) - Run
semantic-code-agents index - Run
semantic-code-agents search "your query"
See Getting Started.
Yes! For production deployments, use Milvus:
[backend]
vector_db = "milvus"
[backend.milvus]
uri = "http://milvus:19530"See Configuration Guide.
Common causes:
- Query is too vague - Be specific
- Code is poorly named - Better names → better results
- Wrong embeddings provider - Try Voyage or OpenAI
- Small index - Need more code examples
Solutions:
- Try a more specific query
- Improve code comments and variable names
- Try a different embeddings provider
- Increase
top_kparameter
Accuracy depends on:
- Embeddings quality (Voyage best, ONNX okay)
- Code quality (well-named variables, good comments)
- Query specificity (specific beats vague)
- Index size (more examples → better)
Typical: 70-90% relevant results in top-5.
Yes! Use codebase_id:
semantic-code-agents search "query" --codebase my-projectSee Searching Guide.
Recommendations:
- Development: After major changes (new features, refactoring)
- CI/CD: Nightly or weekly full re-index
- Incremental: After each commit (if supported)
For most teams, nightly re-indexing is sufficient.
Indexing:
- ~1,000 files/minute (single-threaded)
- Faster with parallel processing enabled
Searching:
- Embedding: 50ms - 500ms (depends on provider)
- Vector search: 1-50ms (depends on index size)
- Total: ~100ms - 1 second per query
Yes, with proper configuration:
- Use Milvus vector database (not local)
- Use Voyage or OpenAI embeddings
- Exclude non-essential files in
.contextignore - Use parallel indexing
See Configuration Guide for production setup.
No hard limit, but practical considerations:
- Local index: 1-10 GB (depends on RAM)
- Milvus: 100+ GB (scales horizontally)
For 1M+ files, use Milvus with cloud.
- Read Contributing Guide
- Fork the repository
- Create a feature branch
- Make changes and test
- Submit a pull request
The system uses hexagonal architecture:
Add a new embedding provider:
- Implement
Embeddertrait - Add config option
- Add tests
Add a new vector database:
- Implement
VectorDbtrait - Add config option
- Add tests
See Contributing Guide for details.
Email security@example.com instead of opening a public issue.
See Security Policy.
Yes! It's dual-licensed MIT/Apache-2.0, so you can use it in commercial products.
See Troubleshooting: Performance Issues.
See Troubleshooting: Configuration Issues.
See Troubleshooting: Out of Memory.
- Check Getting Started - start fresh
- Check Troubleshooting - common issues
- Check Configuration Guide - verify config
- Open an issue on GitHub with:
- What you tried
- Error message
- Config (without secrets)
Dual-licensed MIT/Apache-2.0. You can choose either.
Yes, both licenses allow commercial use.
Yes, both licenses allow modifications. MIT requires attribution; Apache-2.0 is more permissive.
No, neither MIT nor Apache-2.0 require you to open-source anything.
Still have questions? Check the documentation or open an issue on GitHub.