Skip to content

Weiqin-Zhao/paper-radar

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PaperRadar

中文文档

PaperRadar is a keyword-driven academic paper radar: it automatically fetches the latest papers from arXiv and (optionally) top journals every day, filters and analyzes them with a dual-LLM pipeline, generates daily reports (Markdown + JSON), and serves a built-in Web UI for browsing and searching.

Highlights

  • Multi-source fetching: arXiv RSS + journal RSS (Nature, NEJM, Cell, Science, etc. — configurable in config.yaml)
  • Dual-LLM architecture:
    • Light LLM: quickly matches papers to keywords based on title/abstract (outputs matched_keywords)
    • Heavy multimodal LLM: reads the full PDF and outputs TLDR / contributions / methods / experiments / novelty / limitations / data / code + a quality score
  • Domain summaries: a Markdown summary per domain with numbered paper references (clickable in the Web UI)
  • Daily reports: saved to reports/ (Markdown) and reports/json/ (JSON)
  • Web UI: filter by date/domain, search, sort, clickable reference numbers, paginated loading
  • Docker deployment: in-container cron scheduling + FastAPI web server (default port 8000)

Architecture

                          ┌──────────────────┐
                          │   Cron / Manual   │
                          └────────┬─────────┘
                                   ▼
┌──────────────────────────────────────────────────────────────────┐
│  Stage 0: Fetch                                                  │
│  arXiv RSS · bioRxiv/medRxiv RSS · Journal RSS (Nature, etc.)    │
│  → paper metadata (title, abstract, pdf_url)                     │
└──────────────────────────┬───────────────────────────────────────┘
                           ▼
┌──────────────────────────────────────────────────────────────────┐
│  Stage 1: Light LLM — keyword matching                           │
│  Input: title + abstract + keyword list                          │
│  Output: matched_keywords, relevance, reason                     │
└──────────────────────────┬───────────────────────────────────────┘
                           ▼  (matched papers only)
┌──────────────────────────────────────────────────────────────────┐
│  Stage 2: Heavy Multimodal LLM — deep PDF analysis               │
│  Input: full PDF (base64) + matched keywords                     │
│  Output: TLDR, contributions, methodology, experiments,          │
│          innovations, limitations, quality_score, …              │
└──────────────────────────┬───────────────────────────────────────┘
                           ▼
┌──────────────────────────────────────────────────────────────────┐
│  Stage 3: SummaryAgent — per-domain summaries                    │
│  Generates Markdown summaries with "Paper N" references          │
└──────────────────────────┬───────────────────────────────────────┘
                           ▼
┌──────────────────────────────────────────────────────────────────┐
│  Stage 4: Reporter — output & serve                              │
│  Markdown report → reports/                                      │
│  JSON report    → reports/json/                                  │
│  Web UI (FastAPI) reads JSON for display                         │
└──────────────────────────────────────────────────────────────────┘

Quick Start (Docker)

1) Set up environment variables

cp .env.example .env
nano .env

2) (Optional) Customize config.yaml

  • Don't need journals? Set journals.enabled: false and optionally ezproxy.enabled: false
  • Adjust the schedule: runtime.schedule (cron expression; container timezone is set by TZ)

3) Launch

docker compose up -d --build

4) Open the Web UI

  • http://localhost:8000
  • Health check: http://localhost:8000/api/health

5) Run immediately (optional)

docker compose exec paper-radar python main.py --dry-run

Reports will be written to reports/ and reports/json/. The Web UI will automatically pick up any available dates.

Key Configuration

Environment variables (.env)

Variable Description
LIGHT_LLM_API_BASE / LIGHT_LLM_API_KEY / LIGHT_LLM_MODEL Light LLM (OpenAI-compatible endpoint)
HEAVY_LLM_API_BASE / HEAVY_LLM_API_KEY / HEAVY_LLM_MODEL Multimodal LLM (for PDF analysis)
HKU_LIBRARY_UID / HKU_LIBRARY_PIN (Optional) EZproxy credentials for accessing paywalled journal PDFs
TZ Container timezone (default Asia/Shanghai)
WEB_PORT Web server port (default 8000)
RUN_ON_START Run the pipeline once on container start (default false)

config.yaml

  • keywords: list of domains (name / description / examples)
  • preprints: preprint source config (arXiv + bioRxiv/medRxiv)
  • journals: journal source toggle and list
  • llm: model and rate-limit settings for light / heavy / summary LLMs
  • runtime: cron schedule, concurrency, timeouts, etc.
  • output: Markdown/JSON output paths

Deployment (VPS / NAS)

See DEPLOY_EN.md for step-by-step instructions (中文版).

For production, place port 8000 behind a reverse proxy (Nginx / Caddy) with HTTPS and access control.

FAQ / Troubleshooting

Q: No papers found after running the pipeline? A: Check that your config.yaml keywords are broad enough and that the arXiv categories cover your field. Run with --debug for verbose logs:

docker compose exec paper-radar python main.py --debug --dry-run

Q: EZproxy authentication fails? A: Delete cached cookies and retry:

rm -f cache/ezproxy_cookies.pkl

Make sure HKU_LIBRARY_UID and HKU_LIBRARY_PIN are correct in .env.

Q: Web UI shows no dates? A: The UI reads from reports/json/. Run the pipeline at least once to generate reports. Check that the reports/ volume is mounted correctly in docker-compose.yml.

Q: Heavy LLM analysis is slow or hitting rate limits? A: Adjust llm.heavy.rate_limit.requests_per_minute in config.yaml. You can also reduce the number of papers analyzed by tuning preprints.max_papers_per_source or journals.max_papers_per_journal.

Documentation

Document Language Description
DESIGN_EN.md English Architecture, module responsibilities, API specs
DESIGN.md 中文 架构设计、模块职责、接口规范
DEPLOY_EN.md English Docker deployment guide (VPS / NAS)
DEPLOY.md 中文 Docker 部署指南(VPS / NAS)

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-feature)
  3. Commit your changes (git commit -m 'Add my feature')
  4. Push to the branch (git push origin feature/my-feature)
  5. Open a Pull Request

License

MIT (see LICENSE)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 63.2%
  • JavaScript 20.7%
  • CSS 9.6%
  • HTML 4.9%
  • Shell 1.2%
  • Dockerfile 0.4%