Skip to content

Chat with & listen to your PDFs locally. A full-stack RAG assistant featuring synchronized Text-to-Speech (TTS), AI document chat, and 100% local execution for total privacy.

Notifications You must be signed in to change notification settings

raghu13590/askPDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

49 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

askpdf

A full-stack PDF reading assistant with Text-to-Speech (TTS), RAG (Retrieval Augmented Generation), and AI chatβ€”all designed to run privately and locally on your own machine. Upload a PDF, have it read aloud with synchronized text highlighting, and chat with your document using AI. Everything works for free using open-source models like Docker Model Runner, Ollama, or LMStudioβ€”no cloud/subscriptions required.

🌟 Features

πŸ“„ Reading & TTS

  • Unified Experience: Seamlessly switch between reading the PDF and listening to chat responses
  • Intelligent Text Processing: Robust sentence segmentation with support for Markdown and non-punctuated text
  • High-Quality TTS: Local speech synthesis using Kokoro-82M
  • Visual Tracking: Synchronized sentence highlighting in PDF and message highlighting in Chat
  • Interactive Navigation: Double-click any sentence in the PDF or any message in the Chat to start playback
  • Centralized Controls: Unified player in the footer manages all audio sources (Speed 0.5x - 2.0x)

πŸ’¬ RAG-Powered Chat & Internet Search

  • Semantic Search: Ask questions about your PDF content
  • Vector Storage: Document chunks indexed in Qdrant for fast retrieval
  • Conversational AI: Chat with context from your document using local LLMs
  • Internet Search (DuckDuckGo): Optionally augment answers with live web search results for up-to-date or external information
  • Chat History: Maintains conversation context for follow-up questions

🌐 Internet Search (DuckDuckGo)

You can enable Internet Search in the chat panel to let the AI answer questions using both your PDF and live web results (via DuckDuckGo). This is useful for:

  • Getting up-to-date facts, news, or background not present in your PDF
  • Clarifying ambiguous or missing information

How it works:

  • When enabled, the app performs a DuckDuckGo search for your question and injects the top results into the LLM's context window, along with PDF content.
  • The LLM then answers using both sources.

Privacy:

  • All queries are sent to DuckDuckGo only when Internet Search is enabled.
  • No PDF content is sent to DuckDuckGoβ€”only your question.

Rate Limits:

  • DuckDuckGo and other free search APIs may rate limit requests if used too frequently.
  • If rate limited, the app will notify you and fall back to PDF-only answers.

Model Compatibility:

  • Any OpenAI-compatible LLM can use this feature. The search results are injected as plain text context, so no special model/tool-calling support is required.

🎨 Modern UI

  • Unified Navigation: Double-click sentences or chat bubbles to start reading immediately
  • Dynamic Visual Feedback: PDF sentence highlighting and Chat bubble illumination during playback
  • Resizable Chat Panel: Drag to adjust the chat interface width (300-800px)
  • Auto-Scroll: Both PDF and Chat automatically keep the active being-read content in view
  • Model Selection: Centralized embedding model selection and dynamic LLM discovery

πŸ–₯️ Private & Local Design

All features of this app are designed to run entirely on your own machine or laptop, using only local resources by default. Document processing, AI chat, and TTS all happen locallyβ€”no data is sent to external servers unless you explicitly enable Internet Search.

Privacy Note:

  • When Internet Search is enabled, only your question (not your PDF content or chat history) is sent to DuckDuckGo for web search. All other processing, including PDF parsing, vector search, and LLM inference, remains local and private.
  • If Internet Search is disabled, no data ever leaves your machine.

You can use free, open-source models with Docker Model Runner, Ollama, or LMStudio, so there are no required cloud costs or subscriptions.

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              Docker Compose                                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚    Frontend     β”‚    Backend      β”‚   RAG Service   β”‚       Qdrant          β”‚
β”‚   (Next.js)     β”‚    (FastAPI)    β”‚    (FastAPI)    β”‚   (Vector DB)         β”‚
β”‚   Port: 3000    β”‚   Port: 8000    β”‚   Port: 8001    β”‚   Port: 6333          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                          β”‚
                                          β–Ό
                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                            β”‚         DMR / Ollama / LMStudio / LLM        β”‚
                            β”‚            (OpenAI-compatible)               β”‚
                            β”‚             Port: 12434 (default)            β”‚
                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Services Overview

Service Port Description
Frontend 3000 Next.js React app with PDF viewer and chat UI
Backend 8000 FastAPI server for PDF processing and TTS
RAG Service 8001 FastAPI server for document indexing and AI chat
Qdrant 6333 Vector database for semantic search
DMR/Ollama/LMStudio 12434 Local LLM server (external, user-provided)

πŸ“‹ Prerequisites

  • Docker and Docker Compose
  • Local LLM Server: The app is configured to use an OpenAI-compatible API by default on port 12434.
    • Option A: DMR (Default) - Built into Docker Desktop.
    • Option B: Ollama - Requires running on port 12434 or updating configuration.
    • Option C: LMStudio - Desktop app, exposes OpenAI-compatible API (default: http://localhost:1234/v1).

Required Models (on your LLM server)

  • LLM Model: e.g., ai/qwen3:latest (DMR), llama3 (Ollama), or any chat model supported by LMStudio
  • Embedding Model: e.g., ai/nomic-embed-text-v1.5:latest (DMR), nomic-embed-text (Ollama), or any embedding model supported by LMStudio

πŸš€ Quick Start

1. Clone the Repository

git clone https://github.com/raghu13590/askpdf.git
cd askpdf

2. Create Your .env File

At the root of the project directory (the same folder as docker-compose.yml), create a file named .env with the following content:

LLM_API_URL=http://host.docker.internal:12434

This variable configures the LLM server endpoint. If you are using Ollama on its default port, set:

LLM_API_URL=http://host.docker.internal:11434

Note: After editing .env, restart your containers for changes to take effect.

3. Start Your Local LLM Server

The application requires an OpenAI-compatible API for LLM and embeddings. You can use Docker Model Runner (DMR), Ollama, or LMStudio as your local LLM server.

Option A: Docker Model Runner (DMR) (Recommended)

  1. Ensure Docker Desktop is running and the DMR extension is installed.
  2. Set LLM_API_URL in your .env file to:
    LLM_API_URL=http://host.docker.internal:12434
  3. Download the required models:
  4. Import models into DMR:
    • Open Docker Desktop, go to the DMR extension, and use the "Import Model" button to add the downloaded models.
    • Or, use the DMR CLI:
      dmr import <path-to-model-directory>
  5. Verify both models are listed as Ready in the DMR UI.

Option B: Ollama

Option C: LMStudio

  1. Download and install LMStudio on your machine.
  2. Launch LMStudio and load your desired LLM and embedding models.
  3. LMStudio exposes an OpenAI-compatible API at http://localhost:1234/v1 by default.
  4. Edit your .env file and set:
LLM_API_URL=http://host.docker.internal:1234/v1

(If running outside Docker, use http://localhost:1234/v1.) 5. Restart your containers for changes to take effect.

Note: LMStudio supports a wide range of models and provides a user-friendly interface for model management. Ensure the models you want to use are loaded and available in LMStudio.

Ollama runs on port 11434 by default. The easiest way to use Ollama with this app is to update your .env file (recommended):

Note: Ollama supports the OpenAI-compatible API (used by this app) starting from version 0.1.34 and above. Ensure your Ollama installation is up to date.

Option 1 (Recommended): Change the API endpoint in your .env file

Edit your .env file at the project root and set:

LLM_API_URL=http://host.docker.internal:11434

This will direct the app to use Ollama's default port. (If running outside Docker, you can use http://localhost:11434.)

Note: After changing .env, restart your containers for the new value to take effect.

Option 2: Change Ollama's port to 12434

If you prefer, you can start Ollama on port 12434 to match the default expected by the app:

# Start Ollama on the expected port
OLLAMA_HOST=0.0.0.0:12434 ollama serve

# In a new terminal, pull the models
ollama pull llama3
ollama pull nomic-embed-text

3. Start the Application

docker-compose up --build

4. Access the Application

πŸ“– Usage

Reading a PDF

  1. Select Embedding Model: Choose an embedding model from the dropdown
  2. Upload PDF: Click "Upload PDF" and select your file
  3. Wait for Processing: The PDF is parsed, sentences extracted, and indexed for RAG
  4. Play Audio: Click "Play" to start text-to-speech from the beginning
  5. Navigate: Use playback controls or double-click any sentence in the PDF or any chat bubble to jump to it
  6. Adjust Voice: Select different voice styles and adjust playback speed

Chatting with Your PDF (and the Web)

  1. Select LLM Model: Choose an LLM from the chat panel dropdown
  2. (Optional) Enable Internet Search: Toggle the "Use Internet Search" switch above the chat input to allow the AI to use live web results
  3. Ask Questions: Type your question about the PDF content (or anything else)
  4. Get AI Answers: The system retrieves relevant PDF chunks and, if enabled, web search results, then generates an answer
  5. Continue Conversation: Follow-up questions maintain context
  6. Read Out Loud: Double-click any chat bubble to have the assistant's response (or your own question) read aloud

πŸ› οΈ Technology Stack

Backend Service

Technology Purpose
FastAPI Web framework for REST APIs
PyMuPDF (fitz) PDF parsing with character-level coordinates
spaCy NLP for sentence segmentation
Kokoro Neural TTS with 82M parameters

RAG Service

Technology Purpose
FastAPI Web framework
LangChain LLM/Embedding integration
LangGraph Stateful RAG workflow
Qdrant Client Vector database operations

Frontend

Technology Purpose
Next.js React framework
Material-UI (MUI) UI components
react-pdf PDF rendering
react-markdown Chat message rendering

πŸ“ Project Structure

askpdf/
β”œβ”€β”€ docker-compose.yml          # Multi-service orchestration
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── app/
β”‚       β”œβ”€β”€ main.py             # FastAPI app, upload & TTS endpoints
β”‚       β”œβ”€β”€ pdf_parser.py       # PyMuPDF text extraction with coordinates
β”‚       β”œβ”€β”€ nlp.py              # spaCy sentence segmentation
β”‚       └── tts.py              # Kokoro TTS synthesis
β”œβ”€β”€ rag_service/
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ requirements.txt
β”‚   β”œβ”€β”€ main.py                 # FastAPI app, index & chat endpoints
β”‚   β”œβ”€β”€ rag.py                  # Document chunking & indexing
β”‚   β”œβ”€β”€ agent.py                # LangGraph RAG workflow
β”‚   β”œβ”€β”€ models.py               # LLM/Embedding model clients
β”‚   └── vectordb/
β”‚       β”œβ”€β”€ base.py             # Abstract vector DB interface
β”‚       └── qdrant.py           # Qdrant adapter implementation
└── frontend/
    β”œβ”€β”€ Dockerfile
    β”œβ”€β”€ package.json
    └── src/
        β”œβ”€β”€ pages/
        β”‚   └── index.tsx       # Main application page
        β”œβ”€β”€ components/
        β”‚   β”œβ”€β”€ PdfUploader.tsx     # File upload with model selection
        β”‚   β”œβ”€β”€ PdfViewer.tsx       # PDF rendering with overlays
        β”‚   β”œβ”€β”€ PlayerControls.tsx  # Audio playback controls
        β”‚   β”œβ”€β”€ ChatInterface.tsx   # RAG chat UI
        β”‚   └── TextViewer.tsx      # Alternative text display
        └── lib/
            β”œβ”€β”€ api.ts          # Backend API client
            └── tts-api.ts      # TTS API client

The application expects an OpenAI-compatible API at the URL specified by LLM_API_URL in your .env file (default: http://host.docker.internal:12434).

πŸ“ API Reference

Backend Service (Port 8000)

POST /api/upload

Upload a PDF and extract sentences with bounding boxes.

Request: multipart/form-data

  • file: PDF file

All environment variables, including LLM_API_URL, are now managed via a .env file at the project root. This file is loaded by both Docker Compose and the Python services.

  • embedding_model: Model name for RAG indexing

Response:

{
  "sentences": [
    {
      "id": 0,
| `LLM_API_URL` | RAG Service | `http://host.docker.internal:12434` | LLM server URL (set in `.env`; change to `...:11434` for default Ollama) |
      "bboxes": [
        {"page": 1, "x": 72, "y": 700, "width": 50, "height": 12, "page_height": 792, "page_width": 612}
      ]
    }
  ],
  "pdfUrl": "/abc123.pdf"
}

GET /api/voices

List available TTS voice styles.

Response:

{
  "voices": ["M1.json", "F1.json", "M2.json"]
}

POST /api/tts

Synthesize speech for text.

Request:

{
  "text": "Text to synthesize",
  "voice": "M1.json",
  "speed": 1.0
}

Response:

{
  "audioUrl": "/data/audio/tmp_xyz.wav"
}

RAG Service (Port 8001)

POST /index

Index document text into vector database.

Request:

{
  "text": "Full document text...",
  "embedding_model": "ai/nomic-embed-text-v1.5:latest",
  "metadata": {"filename": "document.pdf", "file_hash": "abc123def456"}
}

POST /chat

Chat with indexed documents.

Request:

{
  "question": "What is this document about?",
  "llm_model": "ai/qwen3:latest",
  "embedding_model": "ai/nomic-embed-text-v1.5:latest",
  "history": [
    {"role": "user", "content": "Previous question"},
    {"role": "assistant", "content": "Previous answer"}
  ]
}

Response:

{
  "answer": "This document discusses...",
  "context": "Retrieved chunks used for the answer..."
}

GET /models

Fetch available models from LLM server.

GET /health

Health check endpoint.

πŸ”§ Configuration

Environment Variables

Variable Service Default Description
NEXT_PUBLIC_API_URL Frontend http://localhost:8000 Backend API URL
NEXT_PUBLIC_RAG_API_URL Frontend http://localhost:8001 RAG API URL
RAG_SERVICE_URL Backend http://rag-service:8000 Internal RAG service URL
QDRANT_HOST RAG Service qdrant Qdrant hostname
QDRANT_PORT RAG Service 6333 Qdrant port
LLM_API_URL RAG Service http://host.docker.internal:12434 LLM server URL (Change to ...:11434 for default Ollama)

Voice Styles

Voice styles (voices) are handled by the Kokoro engine. Available options are discovered dynamically from the system and populated in the UI dropdown.

TTS Parameters

In backend/app/tts.py:

  • total_step: Diffusion steps (default: 5) - higher = better quality, slower
  • speed: Playback speed (0.5 - 2.0)

πŸ”„ Data Flow

PDF Upload Flow

User uploads PDF
  ↓
Backend: Save PDF β†’ Extract text + coordinates (PyMuPDF)
  ↓
Backend: Split into sentences (spaCy)
  ↓
Backend: Map sentences to bounding boxes
  ↓
Backend: Trigger async RAG indexing
  ↓
RAG Service: Chunk text β†’ Generate embeddings β†’ Store in Qdrant
  ↓
Frontend: Display PDF with clickable sentence overlays

Chat Flow

User asks question
  ↓
RAG Service: Embed question
  ↓
RAG Service: Search Qdrant for top-5 relevant chunks
  ↓
RAG Service: Build prompt (system + context + history + question)
  ↓
RAG Service: Call LLM via OpenAI-compatible API
  ↓
Frontend: Display markdown-rendered answer

TTS Playback Flow

User clicks Play or double-clicks sentence
  ↓
Frontend: Request /api/tts with sentence text
  ↓
Backend: Kokoro synthesizes audio β†’ WAV file
  ↓
Frontend: Play audio, highlight current sentence
  ↓
On audio end: Auto-advance to next sentence

🐳 Docker Details

The application uses Docker Compose with four services:

  1. frontend: Next.js dev server with hot reload
  2. backend: FastAPI with TTS models mounted (Supertonic cloned from HuggingFace at build)
  3. rag-service: FastAPI with LangChain/LangGraph
  4. qdrant: Official Qdrant image with persistent storage

Volumes

  • `qdrant_data`: Persistent vector storage
  • Source directories mounted for development hot-reload

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“„ License

This project uses the following third-party technologies:

πŸ™ Acknowledgments

  • hexgrad for the amazing Kokoro-82M model
  • spaCy for robust NLP capabilities
  • LangChain team for the excellent LLM framework
  • Qdrant for the powerful vector database
  • The open-source community for all the amazing tools

πŸ“§ Contact

For questions, issues, or suggestions, please open an issue on the GitHub repository.

About

Chat with & listen to your PDFs locally. A full-stack RAG assistant featuring synchronized Text-to-Speech (TTS), AI document chat, and 100% local execution for total privacy.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published