Skip to content

TreeEast1/TopPaper

Repository files navigation

PaperSurveyor

Agentic Literature Survey Engine for Researchers
Domain-aware paper retrieval, importance-first ranking, cross-domain exploration, and survey report generation.

Stars · Blueprint · Deployment


Why This Project Exists

Most paper tools help you retrieve literature. PaperSurveyor is built to help you finish the survey workflow:

  • understand what a query is really asking
  • route it through domain-aware source strategies
  • rank papers by importance, not only keyword similarity
  • cluster the topic and surface research lineage
  • generate an editable survey report

It is not a generic chat wrapper. It is a workflow-first research engine.

What Makes It Different

Capability Typical Paper Search PaperSurveyor
Ranking objective Relevance-first Importance-first
Domain awareness Weak Built-in venue/source profiles
Cross-domain surveying Manual Native
Survey workflow Fragmented End-to-end
Explainability Minimal Feature-level ranking breakdown
Report generation External Built-in async pipeline

Current MVP

  • Real retrieval from OpenAlex and Crossref
  • PostgreSQL-backed paper cache, venue registry, search history, ranking features, report tasks, and report outputs
  • Built-in domain profiles for:
    • Computer Science
    • Medicine
    • Biology
    • Materials / Physics / Chemistry
    • Economics / Management / Social Science
  • Explainable importance ranking
  • Search UI wired to the live API
  • Redis-backed async report generation queue
  • Docker Compose and Render deployment config

Product Flow

flowchart LR
  A["User Query"] --> B["Query Understanding"]
  B --> C["Domain Router"]
  C --> D["Source Strategy"]
  D --> E["OpenAlex + Crossref Retrieval"]
  E --> F["Importance Ranking"]
  F --> G["Theme Clustering"]
  G --> H["Insight Extraction"]
  H --> I["Survey Report"]
Loading

Architecture

apps/
  web/          Next.js frontend
  api/          FastAPI + SQLAlchemy + Dramatiq
packages/
  agents/       Agent prompts and pipeline definitions
  config/       Domain profiles, venue priorities, source strategies
  core/         Shared ranking logic
docs/           Product blueprint and deployment docs

UI Preview

  • High-end minimal frontend instead of an admin dashboard look
  • Search results optimized for “what should I read first”
  • Importance score, reading level, recommendation reasons, and source provenance shown inline

Built-In Domain Strategy

The MVP ships with editable seed profiles in packages/config/domains/default.json.

Examples:

  • Computer Science: CCF-style top conferences and journals such as NeurIPS, ICML, ICLR, CVPR, KDD
  • Medicine: NEJM, The Lancet, JAMA, BMJ, Nature Medicine, The Lancet Digital Health
  • Biology: Cell, Nature, Science, Nature Biotechnology, Nature Genetics, Genome Biology

These profiles are configuration, not hardcoded logic, so the community can keep extending them by domain.

Importance Ranking

The MVP ranking is explicit and inspectable:

importance_score =
100 * (
  0.30 * relevance_score +
  0.22 * venue_score +
  0.16 * citation_score +
  0.10 * recency_score +
  0.10 * survey_foundation_score +
  0.07 * cross_domain_score +
  0.05 * domain_profile_boost
)

Implementation:

Database Model

Key tables already wired into the MVP:

  • papers
  • authors
  • paper_authors
  • venues
  • domains
  • domain_source_profiles
  • search_history
  • ranking_features
  • report_tasks
  • report_outputs

Database seed bootstrap:

cd apps/api
python -m venv .venv
source .venv/bin/activate
pip install -e .
python -m app.api_cli

Quickstart

Option 1: Docker Demo

docker compose up --build

Open:

Option 2: Manual Development

pnpm install
pnpm dev:web
cd apps/api
python -m venv .venv
source .venv/bin/activate
pip install -e .
python -m app.api_cli
uvicorn app.main:app --reload --port 8000
cd apps/api
source .venv/bin/activate
dramatiq app.worker

Example API Calls

curl "http://localhost:8000/search?q=multimodal%20clinical%20decision%20support&domains=computer_science&domains=medicine&year_from=2021&year_to=2026"
curl -X POST http://localhost:8000/report/generate \
  -H "Content-Type: application/json" \
  -d '{"query":"multimodal clinical decision support","paper_ids":["<paper-uuid>"]}'

Online Demo Deployment

This repo includes:

Recommended hosted topology:

  • papersurveyor-web on Render
  • papersurveyor-api on Render
  • papersurveyor-worker on Render
  • managed PostgreSQL
  • managed Redis

Source References For Initial Seed Profiles

Initial high-authority venue seeds were curated from official or primary sources, including:

Roadmap

  • Replace heuristic query understanding with structured LLM planning
  • Add OpenAlex citation graph exploration
  • Add PubMed and Semantic Scholar adapters
  • Add workspace persistence and collaborative editing
  • Add PDF parsing and evidence extraction
  • Add benchmark datasets for ranking evaluation

Contributing

If you want to extend domain profiles, provider adapters, ranking features, or UI workflows, open an issue or PR. Good first contribution areas:

  • new domain profiles
  • venue authority tuning
  • provider adapters
  • report templates
  • frontend interaction polish

Star This Repo

If you care about open-source tooling for serious literature survey work rather than another generic research chatbot, this project is worth watching.

About

Agentic Literature Survey Engine for Researchers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors