Find the exact page that answers your question.
A lightweight desktop tool for students to search through course PDFs using natural language.
- Hybrid Search — Combines keyword matching (BM25) with semantic understanding
- Two Index Modes — Fast mode for quick startup, Deep mode for comprehensive search
- Multilingual Support — Search Chinese documents with English queries (and vice versa)
- Works Offline — No internet needed after initial setup
- Open PDF at Page — Double-click a result to jump directly to that page
- Adjustable Search Mode — Slider to balance between semantic and literal matching
Download the latest release from Releases and run Locus.exe.
# Clone the repo
git clone https://github.com/llk214/semantic-locator.git
cd semantic-locator
# Install dependencies
pip install -r requirements.txt
# Run
python gui.py- Click Browse and select a folder containing your PDFs
- Click Load Index and choose index mode:
- ⚡ Fast Index — Quick startup, good for small collections
- 🔬 Deep Index — Slower startup, finds all semantically related content
- Type your question and hit Search
- Double-click any result to open the PDF at that page
Choose based on your hardware and needs:
| Option | Size | RAM | Best For |
|---|---|---|---|
| ⚡ Fast | ~80MB | 4GB | Any laptop, fastest |
| ⚖️ Balanced | ~130MB | 4GB | Standard laptops |
| 🎯 High Accuracy | ~440MB | 8GB | Better results |
| 🚀 Best | ~1.3GB | 16GB | Performance PCs |
| 🌍 Multilingual | ~2.2GB | 16GB+ | 100+ languages |
| Mode | Startup | Search | Use When |
|---|---|---|---|
| ⚡ Fast | Quick | Good | Small collections, quick lookups |
| 🔬 Deep | Slower | Best | Large collections, thorough research |
Deep mode pre-computes embeddings for all pages, enabling:
- Full semantic search across all documents
- Finding related content even without keyword matches
- Cross-lingual search (with Multilingual model)
With the 🌍 Multilingual model, you can:
- Search Chinese PDFs with English queries
- Search English PDFs with Chinese queries
- Mix languages in your document collection
When cross-lingual search is active, you'll see: 🌍 Cross-lingual: X results (semantic only)
Adjust how search works:
🧠 Semantic ◀━━━━━━━━━━▶ 🔤 Literal
| Slide Left | Slide Right |
|---|---|
| Understands meaning | Matches exact words |
| "How to prevent overfitting?" | "regularization" |
- ✅ PDF (
.pdf)
Tip: Export your
.pptxand.docxfiles to PDF for best results
- Python 3.8+
- ~500MB - 2.5GB disk space (depending on model)
- PDF reader with command-line support (e.g., SumatraPDF)
PyMuPDF # PDF text extraction
rank-bm25 # Keyword search
sentence-transformers # Semantic matching
customtkinter # Modern GUI
- Use Deep mode for large collections — ensures nothing is missed
- Use specific terms — "Q-learning update rule" works better than "how does it learn"
- Adjust the slider — Literal mode for exact terms, semantic mode for concepts
- Try Multilingual — if you have mixed-language documents
What's the difference between Fast and Deep index?
Fast mode uses BM25 to filter candidates first (may miss semantically related pages). Deep mode searches all pages semantically (slower startup, better results).
Is this an AI/LLM?
No. It uses embedding models for similarity matching, not generative AI. It finds information — it doesn't generate answers.
Can I use this during exams?
If "no LLM" is the rule, this tool is fine — it's just a smart search engine for your own materials.
Why doesn't the page jump work?
Install SumatraPDF — it has the best command-line page navigation support.
MIT — free for personal and educational use.
Made for students, by students 📖