Semantic file search for your local filesystem.
Ask questions in plain English - find what you need across code, documents, images, audio, and video.
Powered by Google Gemini Embedding 2 and ChromaDB.
Traditional file search (grep, find, ag) matches exact text. EmbeddedFinder understands meaning. Search for "error handling in payments" and find files about exception catching in billing code, even if those exact words never appear.
It works on everything: source code, config files, PDFs, Word documents, images, audio, and video, all in one index.
- Natural language search - describe what you're looking for, not keywords
- Multimodal indexing - code, text, PDFs, DOCX, images, audio, and video files
- Interactive TUI - rich terminal UI with slash commands, progress bars, and color-coded results
- First-run setup wizard - guided onboarding with API key validation
- Incremental indexing - content-hashed, only re-processes changed files
- Batch embedding - groups chunks into minimal API calls for fast indexing
- File watching - auto-reindex when files change on disk
- One-shot CLI - scriptable commands for CI/automation
- Smart ranking - filename matching, file type relevance, and content-aware scoring
pip install embedded-finderOr from source:
git clone https://github.com/vladmarian20005/EmbeddedFinder.git
cd EmbeddedFinder
pip install .efindOn first launch, a setup wizard walks you through:
- Enter your Google AI API key (free tier available)
- The key is validated and saved securely to
~/.config/embeddedfinder/config.json - Optionally index a directory right away
That's it. Start searching.
# Option A: environment variable
export GOOGLE_API_KEY=your-key-here
# Option B: .env file in your project root
echo "GOOGLE_API_KEY=your-key-here" > .env
# Option C: set it interactively
efind
# then type: /key setefindType natural language queries at the ❯ prompt:
❯ database migration scripts
❯ files that handle image resizing
❯ error handling in the payment module
❯ screenshots of the dashboard
❯ audio files with speech
Results show similarity scores, file types, paths, and content snippets - color-coded by relevance.
| Command | Description |
|---|---|
/index <path> |
Index a directory |
/reindex <path> |
Re-index only changed files |
/status |
Show index statistics |
/clear |
Clear the entire index |
/watch <path> |
Watch a directory and auto-reindex |
/key |
Show current API key info |
/key set |
Set or change your API key |
/key delete |
Remove saved API key |
/key show |
Reveal the full API key |
/help |
Show available commands |
/quit or Ctrl+C |
Exit |
For scripting and one-off use:
# Index a directory
efind index ./src
# Index specific file types only
efind index ./src -e .py -e .ts
# Search
efind search "authentication middleware"
# Search with options
efind search "config parsing" --top 5 --min-score 0.7
# Plain text output (no colors, good for piping)
efind search "database models" --plain
# Re-index changed files only
efind reindex ./src
# Watch for changes
efind watch ./src
# Show index stats
efind status
# Clear the index
efind clear
# Check version
efind --version| Category | Extensions |
|---|---|
| Code | .py .js .ts .jsx .tsx .java .c .cpp .h .hpp .go .rs .rb .php .swift .kt .scala .sh .bash .zsh .lua .pl .ex .exs .r .m .sql |
| Markup | .html .css .scss .less .xml .svg |
| Config | .json .yaml .yml .toml .ini .cfg .conf |
| Text | .txt .md .rst .csv |
| Documents | .pdf .docx |
| Images | .png .jpg .jpeg .gif .webp .bmp |
| Audio | .mp3 .wav .ogg .flac .m4a |
| Video | .mp4 .mov .avi .mkv .webm |
Images, audio, and video are embedded natively using Gemini's multimodal capabilities - no transcription or OCR needed.
PDFs with 6 or fewer pages are embedded natively; larger PDFs use text extraction for efficiency.
Directory EmbeddedFinder ChromaDB
───────── ───────────────────────── ─────────────────────
files/ ──→ 1. Crawl (skip .git, etc.)
──→ 2. Extract (text / bytes)
──→ 3. Chunk (~2000 tokens)
──→ 4. Hash (SHA-256 dedup)
──→ 5. Embed (Gemini API) ──→ Store vectors
query ──→ 6. Embed query ──→ Nearest-neighbor
──→ 7. Deduplicate by file search
──→ 8. Re-rank & boost ──→ Results
- Content hashing - files are fingerprinted with SHA-256; re-indexing skips anything unchanged
- Batch embedding - text chunks are grouped into batches (up to 100 per API call) for throughput
- Rate limiting - built-in token bucket limiter respects Gemini API quotas
- Parallel processing - multi-threaded extraction and embedding with up to 4 workers
- Smart ranking - results are boosted by filename match, file type relevance to query, content overlap, and path depth
- Directory filtering - hidden directories (starting with
.) and common non-content directories (node_modules,__pycache__,.venv,dist,build, etc.) are automatically skipped during crawling
| Variable | Default | Description |
|---|---|---|
GOOGLE_API_KEY |
- | Google AI API key (required) |
EMBEDDEDFINDER_DB_DIR |
.embeddedfinder/db |
Path to the ChromaDB database |
The API key can also be stored via the setup wizard or /key set, which saves it to ~/.config/embeddedfinder/config.json with owner-only permissions.
embedded_finder/
├── cli.py # Click CLI, subcommands + TUI launcher
├── tui.py # Interactive Rich-based REPL
├── config.py # Settings, supported extensions, env vars
├── config_store.py # Persistent config file management
├── crawler.py # Recursive file discovery
├── extractor.py # Text extraction, chunking, MIME detection
├── embedder.py # Gemini Embedding API client + batching
├── store.py # ChromaDB vector store
├── indexer.py # Orchestrates crawl → extract → embed → store
├── search.py # Query embedding + nearest-neighbor search
├── ranker.py # Result ranking, dedup, and formatting
├── rate_limiter.py # Token bucket rate limiter
└── watcher.py # Filesystem watcher (watchdog)
# Clone and install with dev dependencies
git clone https://github.com/vladmarian20005/EmbeddedFinder.git
cd EmbeddedFinder
pip install -e ".[dev]"
# Run tests
pytest
# Run tests with coverage
pytest --cov=embedded_finderContributions are welcome! Please open an issue or submit a pull request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/my-feature) - Commit your changes (
git commit -m 'Add my feature') - Push to the branch (
git push origin feature/my-feature) - Open a pull request
MIT