Frequently Asked Questions

General

What is semantic-code-agents-rs?

It's a semantic code search engine that uses embeddings and vector databases to find code based on meaning, not keywords. Ask questions like "error handling" and find all error handling code, regardless of the exact variable names or syntax used.

How is this different from grep/ripgrep?

grep/ripgrep: Find exact text matches (fast but limited)
semantic-code-agents-rs: Find by meaning using AI embeddings (powerful but requires computation)

Use semantic search when:

You don't know the exact code name
You want conceptually similar code
You're exploring unfamiliar codebases

Use grep when:

You know the exact identifier
You need speed on every single search
You're working with tiny codebases

Is it production-ready?

Yes! The code follows production standards:

Comprehensive error handling
Type-safe design with Rust
Extensive testing
Hexagonal architecture for extensibility
Multiple production-grade integrations (Milvus, OpenAI, etc.)

Is my code private?

It depends on your configuration:

Local (ONNX) + Local index: All processing local, completely private
Cloud embeddings: Your code is sent to the embedding provider (OpenAI, Gemini, etc.)

For sensitive code, use local ONNX embeddings.

How much storage does the index need?

Typical ratios:

Source code: 1 MB → 5-10 MB index (with embeddings)
10,000 files → 500 MB - 2 GB (depending on file size)

Example:

A 100 MB codebase → ~500 MB to 1 GB index

Use semantic-code-agents status to check current index size.

Configuration & Setup

Do I need an API key?

No, not required. The default is ONNX (local embeddings, no key needed).

To use cloud embeddings, you'll need:

OpenAI: OPENAI_API_KEY
Gemini: GEMINI_API_KEY
Voyage: VOYAGE_API_KEY
Ollama: Local (no key)

How do I choose an embeddings provider?

For best results (in order):

Voyage - Fast, accurate, designed for code
OpenAI - General purpose, good for code
Gemini - Good semantic understanding
ONNX - Free, local, good enough

For speed:

ONNX - Instant (local)
Voyage - ~50ms
OpenAI - ~100-200ms
Gemini - ~200-300ms

For cost:

ONNX - $0
Ollama - $0
Voyage - $$$$ (cheapest cloud)
OpenAI - $$$

Choose ONNX unless you need higher quality.

How do I use it with my project?

Add .context/config.toml to your project root
Add .contextignore (like .gitignore)
Run semantic-code-agents index
Run semantic-code-agents search "your query"

See Getting Started.

Can I use Milvus instead of local index?

Yes! For production deployments, use Milvus:

[backend]
vector_db = "milvus"

[backend.milvus]
uri = "http://milvus:19530"

See Configuration Guide.

Usage

Why are my search results bad?

Common causes:

Query is too vague - Be specific
Code is poorly named - Better names → better results
Wrong embeddings provider - Try Voyage or OpenAI
Small index - Need more code examples

Solutions:

Try a more specific query
Improve code comments and variable names
Try a different embeddings provider
Increase top_k parameter

How accurate is the search?

Accuracy depends on:

Embeddings quality (Voyage best, ONNX okay)
Code quality (well-named variables, good comments)
Query specificity (specific beats vague)
Index size (more examples → better)

Typical: 70-90% relevant results in top-5.

Can I search multiple codebases?

Yes! Use codebase_id:

semantic-code-agents search "query" --codebase my-project

See Searching Guide.

How often should I re-index?

Recommendations:

Development: After major changes (new features, refactoring)
CI/CD: Nightly or weekly full re-index
Incremental: After each commit (if supported)

For most teams, nightly re-indexing is sufficient.

Performance & Scaling

How fast is it?

Indexing:

~1,000 files/minute (single-threaded)
Faster with parallel processing enabled

Searching:

Embedding: 50ms - 500ms (depends on provider)
Vector search: 1-50ms (depends on index size)
Total: ~100ms - 1 second per query

Can it handle large codebases (100k+ files)?

Yes, with proper configuration:

Use Milvus vector database (not local)
Use Voyage or OpenAI embeddings
Exclude non-essential files in .contextignore
Use parallel indexing

See Configuration Guide for production setup.

What's the maximum codebase size?

No hard limit, but practical considerations:

Local index: 1-10 GB (depends on RAM)
Milvus: 100+ GB (scales horizontally)

For 1M+ files, use Milvus with cloud.

Development

How do I contribute?

Read Contributing Guide
Fork the repository
Create a feature branch
Make changes and test
Submit a pull request

How can I extend it (add adapters)?

The system uses hexagonal architecture:

Add a new embedding provider:

Implement Embedder trait
Add config option
Add tests

Add a new vector database:

Implement VectorDb trait
Add config option
Add tests

See Contributing Guide for details.

Where should I report security vulnerabilities?

Email security@example.com instead of opening a public issue.

See Security Policy.

Can I use this commercially?

Yes! It's dual-licensed MIT/Apache-2.0, so you can use it in commercial products.

Troubleshooting

Nothing seems to work

Check Getting Started - start fresh
Check Troubleshooting - common issues
Check Configuration Guide - verify config
Open an issue on GitHub with:
- What you tried
- Error message
- Config (without secrets)

Licensing & Legal

What license is this?

Dual-licensed MIT/Apache-2.0. You can choose either.

Can I use this in a commercial product?

Yes, both licenses allow commercial use.

Can I modify the code?

Yes, both licenses allow modifications. MIT requires attribution; Apache-2.0 is more permissive.

Do I need to open-source my project?

No, neither MIT nor Apache-2.0 require you to open-source anything.

Still have questions? Check the documentation or open an issue on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frequently Asked Questions

General

What is semantic-code-agents-rs?

How is this different from grep/ripgrep?

Is it production-ready?

Is my code private?

How much storage does the index need?

Configuration & Setup

Do I need an API key?

How do I choose an embeddings provider?

How do I use it with my project?

Can I use Milvus instead of local index?

Usage

Why are my search results bad?

How accurate is the search?

Can I search multiple codebases?

How often should I re-index?

Performance & Scaling

How fast is it?

Can it handle large codebases (100k+ files)?

What's the maximum codebase size?

Development

How do I contribute?

How can I extend it (add adapters)?

Where should I report security vulnerabilities?

Can I use this commercially?

Troubleshooting

My search is slow

I get "API key not found"

Index is using too much memory

Nothing seems to work

Licensing & Legal

What license is this?

Can I use this in a commercial product?

Can I modify the code?

Do I need to open-source my project?

FilesExpand file tree

FAQ.md

Latest commit

History

FAQ.md

File metadata and controls

Frequently Asked Questions

General

What is semantic-code-agents-rs?

How is this different from grep/ripgrep?

Is it production-ready?

Is my code private?

How much storage does the index need?

Configuration & Setup

Do I need an API key?

How do I choose an embeddings provider?

How do I use it with my project?

Can I use Milvus instead of local index?

Usage

Why are my search results bad?

How accurate is the search?

Can I search multiple codebases?

How often should I re-index?

Performance & Scaling

How fast is it?

Can it handle large codebases (100k+ files)?

What's the maximum codebase size?

Development

How do I contribute?

How can I extend it (add adapters)?

Where should I report security vulnerabilities?

Can I use this commercially?

Troubleshooting

My search is slow

I get "API key not found"

Index is using too much memory

Nothing seems to work

Licensing & Legal

What license is this?

Can I use this in a commercial product?

Can I modify the code?

Do I need to open-source my project?