Chat with your research papers using AI. Upload a PDF, view it side-by-side, and ask questions with grounded, cited answers. Supports text chat, voice chat, and read-aloud.
- PDF Upload & Viewer — Drag-and-drop PDF upload with full in-browser rendering via PDF.js
- Grounded Q&A — Ask questions and get answers with page-number citations. Click citations to jump to the page.
- Summarize — Summarize the current page, a section, or extract contributions & limitations
- Read Aloud — TTS via ElevenLabs reads page content aloud
- Voice Chat — Hold-to-talk voice input (STT → RAG → TTS) with audio response
- OCR Fallback — Scanned PDFs are automatically sent to Reducto for OCR
- URL Import — Paste an article URL to fetch and process it
- Backend: Python, FastAPI, Uvicorn, SQLAlchemy (SQLite), FAISS
- Frontend: Vanilla HTML/JS/CSS (no Node), PDF.js via CDN
- LLM: OpenRouter (Qwen 2.5 72B or any model)
- Embeddings: sentence-transformers (local,
all-MiniLM-L6-v2) with OpenRouter fallback - Voice: ElevenLabs (STT + TTS)
- OCR: Reducto (automatic fallback for scanned PDFs)
cd paper_talk
python -m venv venv
source venv/bin/activate # macOS/Linux
# venv\Scripts\activate # Windows
pip install -r requirements.txtcp .env.example .envEdit .env and fill in your API keys:
| Variable | Required | Description |
|---|---|---|
OPENROUTER_API_KEY |
Yes | OpenRouter API key for LLM |
OPENROUTER_MODEL |
No | Model to use (default: qwen/qwen-2.5-72b-instruct) |
OPENROUTER_EMBED_MODEL |
No | OpenRouter embedding model (leave empty for local) |
ELEVENLABS_API_KEY |
Yes* | ElevenLabs key for TTS/STT (*only for voice features) |
ELEVENLABS_VOICE_ID |
No | Voice ID (default: Rachel) |
REDUCTO_API_KEY |
No | Reducto key for OCR (only needed for scanned PDFs) |
BASE_URL |
No | Base URL (default: http://localhost:8000) |
cd backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000Then open http://localhost:8000 in your browser.
- Upload a PDF via drag-and-drop or paste an article URL
- Wait for processing (text extraction, chunking, embedding)
- Ask questions in the chat — answers include clickable page citations
- Use action buttons to summarize pages, extract contributions, etc.
- Click Read Aloud to hear the current page via TTS
- Hold to Talk for voice chat (requires microphone access)
paper_talk/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI entry point
│ │ ├── config.py # Pydantic settings
│ │ ├── db.py # SQLAlchemy async setup
│ │ ├── models.py # DB models
│ │ ├── schemas.py # Pydantic request/response schemas
│ │ ├── services/
│ │ │ ├── pdf_extract.py # PDF text extraction (pypdf + pdfplumber)
│ │ │ ├── reducto_ocr.py # Reducto OCR client
│ │ │ ├── chunking.py # Text chunking with page metadata
│ │ │ ├── embeddings.py # Embedding (local + OpenRouter)
│ │ │ ├── faiss_index.py # FAISS vector index
│ │ │ ├── rag.py # RAG pipeline + citation parsing
│ │ │ ├── openrouter_client.py # OpenRouter LLM client
│ │ │ ├── elevenlabs_client.py # ElevenLabs STT/TTS
│ │ │ └── audio_utils.py # Audio file helpers
│ │ ├── routes/
│ │ │ ├── upload.py # POST /api/upload, /api/upload_url
│ │ │ ├── papers.py # GET/POST /api/papers/{id}
│ │ │ ├── chat.py # POST /api/chat
│ │ │ └── speech.py # POST /api/stt, /api/tts, /api/voice_chat
│ │ ├── templates/
│ │ │ ├── index.html # Upload page
│ │ │ └── paper.html # Viewer + composer page
│ │ └── static/
│ │ ├── app.js # Frontend logic
│ │ └── styles.css # Styles
│ └── tests/
│ ├── test_chunking.py
│ └── test_rag_citations.py
├── storage/ # Created at runtime (PDFs, audio, indices, DB)
├── .env.example
├── requirements.txt
└── README.md
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/upload |
Upload PDF (multipart) |
POST |
/api/upload_url |
Fetch from URL |
POST |
/api/papers/{id}/process |
Trigger processing |
GET |
/api/papers/{id} |
Get paper status/metadata |
POST |
/api/chat |
Text chat with RAG |
POST |
/api/stt |
Speech-to-text |
POST |
/api/tts |
Text-to-speech |
POST |
/api/voice_chat |
Full voice pipeline |
cd backend
python -m pytest tests/ -v- "OPENROUTER_API_KEY is not set" — Make sure
.envis in thepaper_talk/root directory (next torequirements.txt) - PDF not rendering — Check browser console; ensure the PDF file exists in
storage/pdfs/ - Voice features not working — Ensure
ELEVENLABS_API_KEYis set. Microphone requires HTTPS in production (localhost is fine for dev) - Slow first query — The local embedding model (
all-MiniLM-L6-v2) downloads on first use (~80MB). Subsequent loads are cached. - OCR not triggering — Set
REDUCTO_API_KEYin.env. OCR only triggers when pages have very little extracted text. - Port in use — Change port:
uvicorn app.main:app --port 8001
Edit backend/app/config.py to change:
CHUNK_SIZE/CHUNK_OVERLAP— Text chunking parametersTOP_K— Number of chunks retrieved for RAGMIN_CHARS_PER_PAGE_FOR_OCR— OCR trigger thresholdLOCAL_EMBED_MODEL— Local embedding model name