Papers, Please

A system to fetch scientific papers PDFs, and perform semantic search on their contents.

🎮 Papers, Please (whose branding we have absolutely not stolen)

Demo

(Todo)

Usage

First, configure your secrets (example provided at .env.example)

Then, just docker-compose up and you should be good to go.

Architecture

System is split into 4 services

Frontend: Built with React with the help of Claude
Backend:
- REST API using FastAPI
- fetch new paper metadata using SemanticScholar's API
- query registered papers, enabling user to search inside PDFs
Worker:
- Performs slow batch processing tasks
- Download paper PDFs automatically
- Extract text from PDF with RapidOCR
- Chunk text with Docling HybridChunker
- Embeds chunks and indexes to PineCone vector DB
Postgres DB

Limitations

Worker service is essentialy a cronjob which polls DB for pending PDFs, sequentially process a batch (Download -> OCR -> Chunk -> Embed), and requeues any failed documents.

At scale this is not ideal, because it creates a coupling between online tasks (backend service) and batch tasks: both keep hitting the same DB.

I'd like in the future to add something like Redis for the worker service, replacing this polling mechanism.

Development

If you are on Nix, just nix develop to get the development shell from the flake. Optionally, direnv allow to automatically activate the shell when you cd in the project folder.

If you are a normal healthy person, use the amazing uv package manager to create a virtual environment with everything you need (for backend).

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
assets		assets
services		services
.env.example		.env.example
.envrc		.envrc
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
compose.yaml		compose.yaml
flake.lock		flake.lock
flake.nix		flake.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Papers, Please

Demo

Usage

Architecture

Limitations

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Papers, Please

Demo

Usage

Architecture

Limitations

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages