Skip to content

mohammed840/paper_talk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PaperTalk

Chat with your research papers using AI. Upload a PDF, view it side-by-side, and ask questions with grounded, cited answers. Supports text chat, voice chat, and read-aloud.

Features

  • PDF Upload & Viewer — Drag-and-drop PDF upload with full in-browser rendering via PDF.js
  • Grounded Q&A — Ask questions and get answers with page-number citations. Click citations to jump to the page.
  • Summarize — Summarize the current page, a section, or extract contributions & limitations
  • Read Aloud — TTS via ElevenLabs reads page content aloud
  • Voice Chat — Hold-to-talk voice input (STT → RAG → TTS) with audio response
  • OCR Fallback — Scanned PDFs are automatically sent to Reducto for OCR
  • URL Import — Paste an article URL to fetch and process it

Tech Stack

  • Backend: Python, FastAPI, Uvicorn, SQLAlchemy (SQLite), FAISS
  • Frontend: Vanilla HTML/JS/CSS (no Node), PDF.js via CDN
  • LLM: OpenRouter (Qwen 2.5 72B or any model)
  • Embeddings: sentence-transformers (local, all-MiniLM-L6-v2) with OpenRouter fallback
  • Voice: ElevenLabs (STT + TTS)
  • OCR: Reducto (automatic fallback for scanned PDFs)

Quick Start

1. Clone and set up environment

cd paper_talk
python -m venv venv
source venv/bin/activate   # macOS/Linux
# venv\Scripts\activate    # Windows

pip install -r requirements.txt

2. Configure API keys

cp .env.example .env

Edit .env and fill in your API keys:

Variable Required Description
OPENROUTER_API_KEY Yes OpenRouter API key for LLM
OPENROUTER_MODEL No Model to use (default: qwen/qwen-2.5-72b-instruct)
OPENROUTER_EMBED_MODEL No OpenRouter embedding model (leave empty for local)
ELEVENLABS_API_KEY Yes* ElevenLabs key for TTS/STT (*only for voice features)
ELEVENLABS_VOICE_ID No Voice ID (default: Rachel)
REDUCTO_API_KEY No Reducto key for OCR (only needed for scanned PDFs)
BASE_URL No Base URL (default: http://localhost:8000)

3. Run the server

cd backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Then open http://localhost:8000 in your browser.

4. Usage

  1. Upload a PDF via drag-and-drop or paste an article URL
  2. Wait for processing (text extraction, chunking, embedding)
  3. Ask questions in the chat — answers include clickable page citations
  4. Use action buttons to summarize pages, extract contributions, etc.
  5. Click Read Aloud to hear the current page via TTS
  6. Hold to Talk for voice chat (requires microphone access)

Project Structure

paper_talk/
├── backend/
│   ├── app/
│   │   ├── main.py              # FastAPI entry point
│   │   ├── config.py            # Pydantic settings
│   │   ├── db.py                # SQLAlchemy async setup
│   │   ├── models.py            # DB models
│   │   ├── schemas.py           # Pydantic request/response schemas
│   │   ├── services/
│   │   │   ├── pdf_extract.py   # PDF text extraction (pypdf + pdfplumber)
│   │   │   ├── reducto_ocr.py   # Reducto OCR client
│   │   │   ├── chunking.py      # Text chunking with page metadata
│   │   │   ├── embeddings.py    # Embedding (local + OpenRouter)
│   │   │   ├── faiss_index.py   # FAISS vector index
│   │   │   ├── rag.py           # RAG pipeline + citation parsing
│   │   │   ├── openrouter_client.py  # OpenRouter LLM client
│   │   │   ├── elevenlabs_client.py  # ElevenLabs STT/TTS
│   │   │   └── audio_utils.py   # Audio file helpers
│   │   ├── routes/
│   │   │   ├── upload.py        # POST /api/upload, /api/upload_url
│   │   │   ├── papers.py        # GET/POST /api/papers/{id}
│   │   │   ├── chat.py          # POST /api/chat
│   │   │   └── speech.py        # POST /api/stt, /api/tts, /api/voice_chat
│   │   ├── templates/
│   │   │   ├── index.html       # Upload page
│   │   │   └── paper.html       # Viewer + composer page
│   │   └── static/
│   │       ├── app.js           # Frontend logic
│   │       └── styles.css       # Styles
│   └── tests/
│       ├── test_chunking.py
│       └── test_rag_citations.py
├── storage/                     # Created at runtime (PDFs, audio, indices, DB)
├── .env.example
├── requirements.txt
└── README.md

API Endpoints

Method Endpoint Description
POST /api/upload Upload PDF (multipart)
POST /api/upload_url Fetch from URL
POST /api/papers/{id}/process Trigger processing
GET /api/papers/{id} Get paper status/metadata
POST /api/chat Text chat with RAG
POST /api/stt Speech-to-text
POST /api/tts Text-to-speech
POST /api/voice_chat Full voice pipeline

Running Tests

cd backend
python -m pytest tests/ -v

Troubleshooting

  • "OPENROUTER_API_KEY is not set" — Make sure .env is in the paper_talk/ root directory (next to requirements.txt)
  • PDF not rendering — Check browser console; ensure the PDF file exists in storage/pdfs/
  • Voice features not working — Ensure ELEVENLABS_API_KEY is set. Microphone requires HTTPS in production (localhost is fine for dev)
  • Slow first query — The local embedding model (all-MiniLM-L6-v2) downloads on first use (~80MB). Subsequent loads are cached.
  • OCR not triggering — Set REDUCTO_API_KEY in .env. OCR only triggers when pages have very little extracted text.
  • Port in use — Change port: uvicorn app.main:app --port 8001

Configuration

Edit backend/app/config.py to change:

  • CHUNK_SIZE / CHUNK_OVERLAP — Text chunking parameters
  • TOP_K — Number of chunks retrieved for RAG
  • MIN_CHARS_PER_PAGE_FOR_OCR — OCR trigger threshold
  • LOCAL_EMBED_MODEL — Local embedding model name

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors