Agentic Literature Survey Engine for Researchers
Domain-aware paper retrieval, importance-first ranking, cross-domain exploration, and survey report generation.
Stars · Blueprint · Deployment
Most paper tools help you retrieve literature. PaperSurveyor is built to help you finish the survey workflow:
- understand what a query is really asking
- route it through domain-aware source strategies
- rank papers by importance, not only keyword similarity
- cluster the topic and surface research lineage
- generate an editable survey report
It is not a generic chat wrapper. It is a workflow-first research engine.
| Capability | Typical Paper Search | PaperSurveyor |
|---|---|---|
| Ranking objective | Relevance-first | Importance-first |
| Domain awareness | Weak | Built-in venue/source profiles |
| Cross-domain surveying | Manual | Native |
| Survey workflow | Fragmented | End-to-end |
| Explainability | Minimal | Feature-level ranking breakdown |
| Report generation | External | Built-in async pipeline |
- Real retrieval from
OpenAlexandCrossref - PostgreSQL-backed paper cache, venue registry, search history, ranking features, report tasks, and report outputs
- Built-in domain profiles for:
- Computer Science
- Medicine
- Biology
- Materials / Physics / Chemistry
- Economics / Management / Social Science
- Explainable importance ranking
- Search UI wired to the live API
- Redis-backed async report generation queue
- Docker Compose and Render deployment config
flowchart LR
A["User Query"] --> B["Query Understanding"]
B --> C["Domain Router"]
C --> D["Source Strategy"]
D --> E["OpenAlex + Crossref Retrieval"]
E --> F["Importance Ranking"]
F --> G["Theme Clustering"]
G --> H["Insight Extraction"]
H --> I["Survey Report"]
apps/
web/ Next.js frontend
api/ FastAPI + SQLAlchemy + Dramatiq
packages/
agents/ Agent prompts and pipeline definitions
config/ Domain profiles, venue priorities, source strategies
core/ Shared ranking logic
docs/ Product blueprint and deployment docs
- High-end minimal frontend instead of an admin dashboard look
- Search results optimized for “what should I read first”
- Importance score, reading level, recommendation reasons, and source provenance shown inline
The MVP ships with editable seed profiles in packages/config/domains/default.json.
Examples:
- Computer Science: CCF-style top conferences and journals such as
NeurIPS,ICML,ICLR,CVPR,KDD - Medicine:
NEJM,The Lancet,JAMA,BMJ,Nature Medicine,The Lancet Digital Health - Biology:
Cell,Nature,Science,Nature Biotechnology,Nature Genetics,Genome Biology
These profiles are configuration, not hardcoded logic, so the community can keep extending them by domain.
The MVP ranking is explicit and inspectable:
importance_score =
100 * (
0.30 * relevance_score +
0.22 * venue_score +
0.16 * citation_score +
0.10 * recency_score +
0.10 * survey_foundation_score +
0.07 * cross_domain_score +
0.05 * domain_profile_boost
)
Implementation:
Key tables already wired into the MVP:
papersauthorspaper_authorsvenuesdomainsdomain_source_profilessearch_historyranking_featuresreport_tasksreport_outputs
Database seed bootstrap:
cd apps/api
python -m venv .venv
source .venv/bin/activate
pip install -e .
python -m app.api_clidocker compose up --buildOpen:
- Web: http://localhost:3000
- API: http://localhost:8000
pnpm install
pnpm dev:webcd apps/api
python -m venv .venv
source .venv/bin/activate
pip install -e .
python -m app.api_cli
uvicorn app.main:app --reload --port 8000cd apps/api
source .venv/bin/activate
dramatiq app.workercurl "http://localhost:8000/search?q=multimodal%20clinical%20decision%20support&domains=computer_science&domains=medicine&year_from=2021&year_to=2026"curl -X POST http://localhost:8000/report/generate \
-H "Content-Type: application/json" \
-d '{"query":"multimodal clinical decision support","paper_ids":["<paper-uuid>"]}'This repo includes:
- docker-compose.yml for local full-stack demo
- render.yaml for hosted demo deployment
- docs/DEPLOYMENT.md for setup details
Recommended hosted topology:
papersurveyor-webon Renderpapersurveyor-apion Renderpapersurveyor-workeron Render- managed PostgreSQL
- managed Redis
Initial high-authority venue seeds were curated from official or primary sources, including:
- CCF Academic Evaluation
- NEJM
- The Lancet
- JAMA Network
- BMJ
- Nature Medicine
- Cell
- Nature Biotechnology
- Genome Biology
- Physical Review Letters
- ACS Publications
- Nature Materials
- Journal of Finance
- Management Science
- Replace heuristic query understanding with structured LLM planning
- Add OpenAlex citation graph exploration
- Add PubMed and Semantic Scholar adapters
- Add workspace persistence and collaborative editing
- Add PDF parsing and evidence extraction
- Add benchmark datasets for ranking evaluation
If you want to extend domain profiles, provider adapters, ranking features, or UI workflows, open an issue or PR. Good first contribution areas:
- new domain profiles
- venue authority tuning
- provider adapters
- report templates
- frontend interaction polish
If you care about open-source tooling for serious literature survey work rather than another generic research chatbot, this project is worth watching.