Streamlit web app for generating vector embeddings from PDF documents and images and searching over them using Nomic's ColNomic Embed Multimodal 3B model.
- Multi-PDF and image upload (PNG, JPG, JPEG, WebP) with batch or incremental embedding
- PDF page rendering at configurable DPI (72, 150, 300) via PyMuPDF
- Multi-vector embeddings with ColNomic Embed Multimodal 3B
- Cross-document text search with top-K and score threshold filtering
- Optional per-document search filtering
- Automatic device selection (MPS > CUDA > CPU)
- Per-document and combined JSON downloads with embeddings, DPI, and timing
uv sync
uv run streamlit run streamlit_app.pyuv run ruff check . # lint
uv run ruff format . # format
uv run ty check # typecheck
uv run pytest # test