PaperRadar is a keyword-driven academic paper radar: it automatically fetches the latest papers from arXiv and (optionally) top journals every day, filters and analyzes them with a dual-LLM pipeline, generates daily reports (Markdown + JSON), and serves a built-in Web UI for browsing and searching.
- Multi-source fetching: arXiv RSS + journal RSS (Nature, NEJM, Cell, Science, etc. — configurable in
config.yaml) - Dual-LLM architecture:
- Light LLM: quickly matches papers to keywords based on title/abstract (outputs
matched_keywords) - Heavy multimodal LLM: reads the full PDF and outputs TLDR / contributions / methods / experiments / novelty / limitations / data / code + a quality score
- Light LLM: quickly matches papers to keywords based on title/abstract (outputs
- Domain summaries: a Markdown summary per domain with numbered paper references (clickable in the Web UI)
- Daily reports: saved to
reports/(Markdown) andreports/json/(JSON) - Web UI: filter by date/domain, search, sort, clickable reference numbers, paginated loading
- Docker deployment: in-container cron scheduling + FastAPI web server (default port
8000)
┌──────────────────┐
│ Cron / Manual │
└────────┬─────────┘
▼
┌──────────────────────────────────────────────────────────────────┐
│ Stage 0: Fetch │
│ arXiv RSS · bioRxiv/medRxiv RSS · Journal RSS (Nature, etc.) │
│ → paper metadata (title, abstract, pdf_url) │
└──────────────────────────┬───────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────────┐
│ Stage 1: Light LLM — keyword matching │
│ Input: title + abstract + keyword list │
│ Output: matched_keywords, relevance, reason │
└──────────────────────────┬───────────────────────────────────────┘
▼ (matched papers only)
┌──────────────────────────────────────────────────────────────────┐
│ Stage 2: Heavy Multimodal LLM — deep PDF analysis │
│ Input: full PDF (base64) + matched keywords │
│ Output: TLDR, contributions, methodology, experiments, │
│ innovations, limitations, quality_score, … │
└──────────────────────────┬───────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────────┐
│ Stage 3: SummaryAgent — per-domain summaries │
│ Generates Markdown summaries with "Paper N" references │
└──────────────────────────┬───────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────────┐
│ Stage 4: Reporter — output & serve │
│ Markdown report → reports/ │
│ JSON report → reports/json/ │
│ Web UI (FastAPI) reads JSON for display │
└──────────────────────────────────────────────────────────────────┘
cp .env.example .env
nano .env- Don't need journals? Set
journals.enabled: falseand optionallyezproxy.enabled: false - Adjust the schedule:
runtime.schedule(cron expression; container timezone is set byTZ)
docker compose up -d --buildhttp://localhost:8000- Health check:
http://localhost:8000/api/health
docker compose exec paper-radar python main.py --dry-runReports will be written to reports/ and reports/json/. The Web UI will automatically pick up any available dates.
| Variable | Description |
|---|---|
LIGHT_LLM_API_BASE / LIGHT_LLM_API_KEY / LIGHT_LLM_MODEL |
Light LLM (OpenAI-compatible endpoint) |
HEAVY_LLM_API_BASE / HEAVY_LLM_API_KEY / HEAVY_LLM_MODEL |
Multimodal LLM (for PDF analysis) |
HKU_LIBRARY_UID / HKU_LIBRARY_PIN |
(Optional) EZproxy credentials for accessing paywalled journal PDFs |
TZ |
Container timezone (default Asia/Shanghai) |
WEB_PORT |
Web server port (default 8000) |
RUN_ON_START |
Run the pipeline once on container start (default false) |
keywords: list of domains (name/description/examples)preprints: preprint source config (arXiv+bioRxiv/medRxiv)journals: journal source toggle and listllm: model and rate-limit settings for light / heavy / summary LLMsruntime: cron schedule, concurrency, timeouts, etc.output: Markdown/JSON output paths
See DEPLOY_EN.md for step-by-step instructions (中文版).
For production, place port 8000 behind a reverse proxy (Nginx / Caddy) with HTTPS and access control.
Q: No papers found after running the pipeline?
A: Check that your config.yaml keywords are broad enough and that the arXiv categories cover your field. Run with --debug for verbose logs:
docker compose exec paper-radar python main.py --debug --dry-runQ: EZproxy authentication fails? A: Delete cached cookies and retry:
rm -f cache/ezproxy_cookies.pklMake sure HKU_LIBRARY_UID and HKU_LIBRARY_PIN are correct in .env.
Q: Web UI shows no dates?
A: The UI reads from reports/json/. Run the pipeline at least once to generate reports. Check that the reports/ volume is mounted correctly in docker-compose.yml.
Q: Heavy LLM analysis is slow or hitting rate limits?
A: Adjust llm.heavy.rate_limit.requests_per_minute in config.yaml. You can also reduce the number of papers analyzed by tuning preprints.max_papers_per_source or journals.max_papers_per_journal.
| Document | Language | Description |
|---|---|---|
| DESIGN_EN.md | English | Architecture, module responsibilities, API specs |
| DESIGN.md | 中文 | 架构设计、模块职责、接口规范 |
| DEPLOY_EN.md | English | Docker deployment guide (VPS / NAS) |
| DEPLOY.md | 中文 | Docker 部署指南(VPS / NAS) |
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-feature) - Commit your changes (
git commit -m 'Add my feature') - Push to the branch (
git push origin feature/my-feature) - Open a Pull Request
MIT (see LICENSE)