Built with Streamlit, this app allows you to upload images and PDFs, then query them with natural language to extract insights from charts, diagrams, and document pages.
- PDFs are automatically converted into page images.
- Images are auto-enhanced for better embedding + Q&A.
- Uses Cohere Embed-4 to compute embeddings for each image/page.
- Finds the most semantically relevant page/image for a given query.
- Google Gemini 2.5 Flash analyzes the retrieved visual content.
- Generates clear, context-aware answers to your question.
- Gradient background, styled buttons, answer bubbles, image cards.
- Dual-column layout → relevant image on the left, Gemini answer on the right.
- Stores uploaded files and embeddings in Streamlit session state.
- Enables multiple queries without re-uploading documents.
- Python 3.9+
- Cohere API key
- Google Gemini API key
python -m venv venv
.\venv\Scripts\activatepip install --upgrade pip
pip install -r requirements.txt
pip install streamlit cohere google-genai python-dotenv PyMuPDF pillow numpyCOHERE_API_KEY=your_cohere_api_key
GEMINI_API_KEY=your_gemini_api_keystreamlit run app.py- PDFs → Each page is rendered to an image using PyMuPDF.
- Images → Automatically enhanced and resized for better processing.
- Generates dense multimodal embeddings for every image or PDF page.
- Your query is embedded using search_query mode.
- Cosine similarity retrieves the most relevant image/page.
- The retrieved image + your question are analyzed using Google Gemini.
- Produces a context-aware AI-generated answer.
- Left panel → Relevant image/page
- Right panel → Gemini’s generated answer
- 📊 Extract insights from financial charts
- 📑 Understand tables and diagrams in PDFs
- 🔎 Ask targeted questions about multi-page documents
- 🗂️ Perform visual knowledge retrieval from mixed content (images + PDFs)