PaperRadar

PaperRadar is a keyword-driven academic paper radar: it automatically fetches the latest papers from arXiv and (optionally) top journals every day, filters and analyzes them with a dual-LLM pipeline, generates daily reports (Markdown + JSON), and serves a built-in Web UI for browsing and searching.

Highlights

Multi-source fetching: arXiv RSS + journal RSS (Nature, NEJM, Cell, Science, etc. — configurable in config.yaml)
Dual-LLM architecture:
- Light LLM: quickly matches papers to keywords based on title/abstract (outputs matched_keywords)
- Heavy multimodal LLM: reads the full PDF and outputs TLDR / contributions / methods / experiments / novelty / limitations / data / code + a quality score
Domain summaries: a Markdown summary per domain with numbered paper references (clickable in the Web UI)
Daily reports: saved to reports/ (Markdown) and reports/json/ (JSON)
Web UI: filter by date/domain, search, sort, clickable reference numbers, paginated loading
Docker deployment: in-container cron scheduling + FastAPI web server (default port 8000)

Architecture

                          ┌──────────────────┐
                          │   Cron / Manual   │
                          └────────┬─────────┘
                                   ▼
┌──────────────────────────────────────────────────────────────────┐
│  Stage 0: Fetch                                                  │
│  arXiv RSS · bioRxiv/medRxiv RSS · Journal RSS (Nature, etc.)    │
│  → paper metadata (title, abstract, pdf_url)                     │
└──────────────────────────┬───────────────────────────────────────┘
                           ▼
┌──────────────────────────────────────────────────────────────────┐
│  Stage 1: Light LLM — keyword matching                           │
│  Input: title + abstract + keyword list                          │
│  Output: matched_keywords, relevance, reason                     │
└──────────────────────────┬───────────────────────────────────────┘
                           ▼  (matched papers only)
┌──────────────────────────────────────────────────────────────────┐
│  Stage 2: Heavy Multimodal LLM — deep PDF analysis               │
│  Input: full PDF (base64) + matched keywords                     │
│  Output: TLDR, contributions, methodology, experiments,          │
│          innovations, limitations, quality_score, …              │
└──────────────────────────┬───────────────────────────────────────┘
                           ▼
┌──────────────────────────────────────────────────────────────────┐
│  Stage 3: SummaryAgent — per-domain summaries                    │
│  Generates Markdown summaries with "Paper N" references          │
└──────────────────────────┬───────────────────────────────────────┘
                           ▼
┌──────────────────────────────────────────────────────────────────┐
│  Stage 4: Reporter — output & serve                              │
│  Markdown report → reports/                                      │
│  JSON report    → reports/json/                                  │
│  Web UI (FastAPI) reads JSON for display                         │
└──────────────────────────────────────────────────────────────────┘

Quick Start (Docker)

1) Set up environment variables

cp .env.example .env
nano .env

2) (Optional) Customize `config.yaml`

Don't need journals? Set journals.enabled: false and optionally ezproxy.enabled: false
Adjust the schedule: runtime.schedule (cron expression; container timezone is set by TZ)

3) Launch

docker compose up -d --build

4) Open the Web UI

http://localhost:8000
Health check: http://localhost:8000/api/health

5) Run immediately (optional)

docker compose exec paper-radar python main.py --dry-run

Reports will be written to reports/ and reports/json/. The Web UI will automatically pick up any available dates.

Key Configuration

Environment variables (`.env`)

Variable	Description
`LIGHT_LLM_API_BASE` / `LIGHT_LLM_API_KEY` / `LIGHT_LLM_MODEL`	Light LLM (OpenAI-compatible endpoint)
`HEAVY_LLM_API_BASE` / `HEAVY_LLM_API_KEY` / `HEAVY_LLM_MODEL`	Multimodal LLM (for PDF analysis)
`HKU_LIBRARY_UID` / `HKU_LIBRARY_PIN`	(Optional) EZproxy credentials for accessing paywalled journal PDFs
`TZ`	Container timezone (default `Asia/Shanghai`)
`WEB_PORT`	Web server port (default `8000`)
`RUN_ON_START`	Run the pipeline once on container start (default `false`)

`config.yaml`

keywords: list of domains (name / description / examples)
preprints: preprint source config (arXiv + bioRxiv/medRxiv)
journals: journal source toggle and list
llm: model and rate-limit settings for light / heavy / summary LLMs
runtime: cron schedule, concurrency, timeouts, etc.
output: Markdown/JSON output paths

Deployment (VPS / NAS)

See DEPLOY_EN.md for step-by-step instructions (中文版).

For production, place port 8000 behind a reverse proxy (Nginx / Caddy) with HTTPS and access control.

FAQ / Troubleshooting

Q: No papers found after running the pipeline? A: Check that your config.yaml keywords are broad enough and that the arXiv categories cover your field. Run with --debug for verbose logs:

docker compose exec paper-radar python main.py --debug --dry-run

Q: EZproxy authentication fails? A: Delete cached cookies and retry:

rm -f cache/ezproxy_cookies.pkl

Make sure HKU_LIBRARY_UID and HKU_LIBRARY_PIN are correct in .env.

Q: Web UI shows no dates? A: The UI reads from reports/json/. Run the pipeline at least once to generate reports. Check that the reports/ volume is mounted correctly in docker-compose.yml.

Q: Heavy LLM analysis is slow or hitting rate limits? A: Adjust llm.heavy.rate_limit.requests_per_minute in config.yaml. You can also reduce the number of papers analyzed by tuning preprints.max_papers_per_source or journals.max_papers_per_journal.

Documentation

Document	Language	Description
DESIGN_EN.md	English	Architecture, module responsibilities, API specs
DESIGN.md	中文	架构设计、模块职责、接口规范
DEPLOY_EN.md	English	Docker deployment guide (VPS / NAS)
DEPLOY.md	中文	Docker 部署指南（VPS / NAS）

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/my-feature)
Commit your changes (git commit -m 'Add my feature')
Push to the branch (git push origin feature/my-feature)
Open a Pull Request

License

MIT (see LICENSE)

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
agents		agents
models		models
scripts		scripts
web		web
.env.example		.env.example
.gitignore		.gitignore
DEPLOY.md		DEPLOY.md
DEPLOY_EN.md		DEPLOY_EN.md
DESIGN.md		DESIGN.md
DESIGN_EN.md		DESIGN_EN.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
config.yaml		config.yaml
config_loader.py		config_loader.py
docker-compose.nas.yml		docker-compose.nas.yml
docker-compose.yml		docker-compose.yml
ezproxy_handler.py		ezproxy_handler.py
fetcher.py		fetcher.py
journal_fetcher.py		journal_fetcher.py
main.py		main.py
paper_history.py		paper_history.py
pdf_handler.py		pdf_handler.py
pipeline.py		pipeline.py
pyproject.toml		pyproject.toml
reporter.py		reporter.py
webapp.py		webapp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaperRadar

Highlights

Architecture

Quick Start (Docker)

1) Set up environment variables

2) (Optional) Customize `config.yaml`

3) Launch

4) Open the Web UI

5) Run immediately (optional)

Key Configuration

Environment variables (`.env`)

`config.yaml`

Deployment (VPS / NAS)

FAQ / Troubleshooting

Documentation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PaperRadar

Highlights

Architecture

Quick Start (Docker)

1) Set up environment variables

2) (Optional) Customize config.yaml

3) Launch

4) Open the Web UI

5) Run immediately (optional)

Key Configuration

Environment variables (.env)

config.yaml

Deployment (VPS / NAS)

FAQ / Troubleshooting

Documentation

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2) (Optional) Customize `config.yaml`

Environment variables (`.env`)

`config.yaml`

Packages