PaperTalk

Chat with your research papers using AI. Upload a PDF, view it side-by-side, and ask questions with grounded, cited answers. Supports text chat, voice chat, and read-aloud.

Features

PDF Upload & Viewer — Drag-and-drop PDF upload with full in-browser rendering via PDF.js
Grounded Q&A — Ask questions and get answers with page-number citations. Click citations to jump to the page.
Summarize — Summarize the current page, a section, or extract contributions & limitations
Read Aloud — TTS via ElevenLabs reads page content aloud
Voice Chat — Hold-to-talk voice input (STT → RAG → TTS) with audio response
OCR Fallback — Scanned PDFs are automatically sent to Reducto for OCR
URL Import — Paste an article URL to fetch and process it

Tech Stack

Backend: Python, FastAPI, Uvicorn, SQLAlchemy (SQLite), FAISS
Frontend: Vanilla HTML/JS/CSS (no Node), PDF.js via CDN
LLM: OpenRouter (Qwen 2.5 72B or any model)
Embeddings: sentence-transformers (local, all-MiniLM-L6-v2) with OpenRouter fallback
Voice: ElevenLabs (STT + TTS)
OCR: Reducto (automatic fallback for scanned PDFs)

Quick Start

1. Clone and set up environment

cd paper_talk
python -m venv venv
source venv/bin/activate   # macOS/Linux
# venv\Scripts\activate    # Windows

pip install -r requirements.txt

2. Configure API keys

cp .env.example .env

Edit .env and fill in your API keys:

Variable	Required	Description
`OPENROUTER_API_KEY`	Yes	OpenRouter API key for LLM
`OPENROUTER_MODEL`	No	Model to use (default: `qwen/qwen-2.5-72b-instruct`)
`OPENROUTER_EMBED_MODEL`	No	OpenRouter embedding model (leave empty for local)
`ELEVENLABS_API_KEY`	Yes*	ElevenLabs key for TTS/STT (*only for voice features)
`ELEVENLABS_VOICE_ID`	No	Voice ID (default: Rachel)
`REDUCTO_API_KEY`	No	Reducto key for OCR (only needed for scanned PDFs)
`BASE_URL`	No	Base URL (default: `http://localhost:8000`)

3. Run the server

cd backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Then open http://localhost:8000 in your browser.

4. Usage

Upload a PDF via drag-and-drop or paste an article URL
Wait for processing (text extraction, chunking, embedding)
Ask questions in the chat — answers include clickable page citations
Use action buttons to summarize pages, extract contributions, etc.
Click Read Aloud to hear the current page via TTS
Hold to Talk for voice chat (requires microphone access)

Project Structure

paper_talk/
├── backend/
│   ├── app/
│   │   ├── main.py              # FastAPI entry point
│   │   ├── config.py            # Pydantic settings
│   │   ├── db.py                # SQLAlchemy async setup
│   │   ├── models.py            # DB models
│   │   ├── schemas.py           # Pydantic request/response schemas
│   │   ├── services/
│   │   │   ├── pdf_extract.py   # PDF text extraction (pypdf + pdfplumber)
│   │   │   ├── reducto_ocr.py   # Reducto OCR client
│   │   │   ├── chunking.py      # Text chunking with page metadata
│   │   │   ├── embeddings.py    # Embedding (local + OpenRouter)
│   │   │   ├── faiss_index.py   # FAISS vector index
│   │   │   ├── rag.py           # RAG pipeline + citation parsing
│   │   │   ├── openrouter_client.py  # OpenRouter LLM client
│   │   │   ├── elevenlabs_client.py  # ElevenLabs STT/TTS
│   │   │   └── audio_utils.py   # Audio file helpers
│   │   ├── routes/
│   │   │   ├── upload.py        # POST /api/upload, /api/upload_url
│   │   │   ├── papers.py        # GET/POST /api/papers/{id}
│   │   │   ├── chat.py          # POST /api/chat
│   │   │   └── speech.py        # POST /api/stt, /api/tts, /api/voice_chat
│   │   ├── templates/
│   │   │   ├── index.html       # Upload page
│   │   │   └── paper.html       # Viewer + composer page
│   │   └── static/
│   │       ├── app.js           # Frontend logic
│   │       └── styles.css       # Styles
│   └── tests/
│       ├── test_chunking.py
│       └── test_rag_citations.py
├── storage/                     # Created at runtime (PDFs, audio, indices, DB)
├── .env.example
├── requirements.txt
└── README.md

API Endpoints

Method	Endpoint	Description
`POST`	`/api/upload`	Upload PDF (multipart)
`POST`	`/api/upload_url`	Fetch from URL
`POST`	`/api/papers/{id}/process`	Trigger processing
`GET`	`/api/papers/{id}`	Get paper status/metadata
`POST`	`/api/chat`	Text chat with RAG
`POST`	`/api/stt`	Speech-to-text
`POST`	`/api/tts`	Text-to-speech
`POST`	`/api/voice_chat`	Full voice pipeline

Running Tests

cd backend
python -m pytest tests/ -v

Troubleshooting

"OPENROUTER_API_KEY is not set" — Make sure .env is in the paper_talk/ root directory (next to requirements.txt)
PDF not rendering — Check browser console; ensure the PDF file exists in storage/pdfs/
Voice features not working — Ensure ELEVENLABS_API_KEY is set. Microphone requires HTTPS in production (localhost is fine for dev)
Slow first query — The local embedding model (all-MiniLM-L6-v2) downloads on first use (~80MB). Subsequent loads are cached.
OCR not triggering — Set REDUCTO_API_KEY in .env. OCR only triggers when pages have very little extracted text.
Port in use — Change port: uvicorn app.main:app --port 8001

Configuration

Edit backend/app/config.py to change:

CHUNK_SIZE / CHUNK_OVERLAP — Text chunking parameters
TOP_K — Number of chunks retrieved for RAG
MIN_CHARS_PER_PAGE_FOR_OCR — OCR trigger threshold
LOCAL_EMBED_MODEL — Local embedding model name

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaperTalk

Features

Tech Stack

Quick Start

1. Clone and set up environment

2. Configure API keys

3. Run the server

4. Usage

Project Structure

API Endpoints

Running Tests

Troubleshooting

Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PaperTalk

Features

Tech Stack

Quick Start

1. Clone and set up environment

2. Configure API keys

3. Run the server

4. Usage

Project Structure

API Endpoints

Running Tests

Troubleshooting

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages