AIResearch

AIResearch is a Python-first paper discovery and recommendation pipeline for arXiv and Hugging Face Daily Papers. It helps operators collect candidate papers, score them against a configurable research focus, and publish summaries to downstream channels such as email and Obsidian.

AIResearch 是一个以 Python 为主入口的论文发现与推荐流水线，面向 arXiv 与 Hugging Face Daily Papers。它用于按研究主题收集候选论文、执行多阶段筛选和评分，并将结果输出到邮件、Obsidian 等下游渠道。

Positioning / 项目定位

Research-ops pipeline, not a web app.
Cross-platform CLI: source checkout or installed package both work.
Extensible through local MCP adapters and YAML/JSON configuration.
Designed for reproducible local operation, not zero-config hosted deployment.
这是研究运营型流水线，不是 Web 应用。
官方入口是跨平台 CLI，既支持源码运行，也支持安装后运行。
通过本地 MCP 适配器和 YAML/JSON 配置进行扩展。
面向可复现的本地运行，而不是零配置云托管发布。

Capability Boundaries / 能力边界

What the project does:

Discover papers from arXiv keyword queries, HF daily papers, and HF trending sources.
Apply stage-1 keyword/topic heuristics and downstream candidate scoring.
Prefer PDF download plus MinerU Markdown extraction, then generate single-pass analysis artifacts; fall back to LaTeX only when MinerU exhausts retries.
Optionally send notifications and write notes into an Obsidian vault.

What the project does not do by itself:

It does not provide a browser UI.
It does not bundle third-party credentials or hosted MCP services.
It does not currently persist a public seen-paper dedupe cache.

Install / 安装

Development install:

python -m pip install -r requirements-dev.txt
python -m pip install -e .

If you only want the runtime dependencies, requirements.txt is the smaller install surface, but editable install is recommended for local iteration and GitHub contributors.

如果你只需要运行依赖，可以只安装 requirements.txt，但对于本地调试和 GitHub 协作，推荐继续执行可编辑安装 python -m pip install -e .。

Configuration / 配置准备

Create config.local.yaml from config.example.yaml.
Create mcp.local.json from mcp.example.json.
Fill in local-only values such as recipients, vault path, and credentials.
Export any required environment variables before running.

If you are upgrading an older local config, remove deprecated keys such as seen_cache_path, codex_chunk_chars, codex_chunk_overlap, and the legacy codex_prompt_* TeX prompt paths to avoid startup warnings.

Recommended environment variables:

SEMANTIC_SCHOLAR_API_KEY
GITHUB_TOKEN
EMAIL_ACCOUNT_NAME when using MCP email delivery
Optional: AIRESEARCH_CONFIG, AIRESEARCH_MCP_CONFIG, OBSIDIAN_VAULT, MINERU_API_KEY

配置模板是公开文件；config.local.yaml 与 mcp.local.json 是本地文件，不应提交到 GitHub。

Run / 运行

Module entrypoint from a source checkout:

python -m airesearch --config config.local.yaml --run-once

Installed console script:

airesearch --config config.local.yaml --run-once

Date-range example:

python -m airesearch --config config.local.yaml --run-once --force-run --start-date 2026-03-01 --end-date 2026-03-07

Optional log file:

python -m airesearch --config config.local.yaml --run-once --log-file output/logs/latest.log

ops/ still contains optional PowerShell helpers for local operators, but they are no longer the primary public interface.

MCP Adapters / MCP 适配器

Public example module entrypoints:

python -m airesearch.mcp.arxiv
python -m airesearch.mcp.hf_papers
python -m airesearch.mcp.scholarly
python -m airesearch.mcp.github
python -m airesearch.mcp.obsidian

The bundled mcp.example.json is aligned with those module paths. The email backend still assumes an external mcp-email-server compatible service.

Output Layout / 输出目录说明

output/out/<run_id>/run_summary.json: per-run summary payload.
output/latest_run.txt: pointer to the latest run directory.
state/last_run.json: daily run guard state.
prompts/: prompt assets used by the pipeline.
configs/: whitelist and related public config assets.

More detailed setup and usage notes live in docs/usage.md and docs/seen_papers.md.

Test / 测试

python -m pytest -q

GitHub Actions verifies the package on Python 3.11 and 3.12 across Ubuntu, Windows, and macOS. CI also checks both python -m airesearch --help and airesearch --help.

Security / 安全

Never commit config.local.yaml, mcp.local.json, or .env.
Keep API keys, SMTP credentials, and email account settings in environment variables.
Review generated logs and output artifacts before sharing them publicly.
Treat output/ and state/ as local runtime data, not source-controlled assets.

Agent Notes / 代理说明

AGENTS.md records repository conventions and the current release-hardening tracker for coding agents. It is documentation only and does not affect runtime behavior.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
airesearch		airesearch
configs		configs
docs		docs
ops		ops
prompts		prompts
scripts		scripts
src/airesearch		src/airesearch
tests		tests
tools/recovery		tools/recovery
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
config.example.yaml		config.example.yaml
mcp.example.json		mcp.example.json
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIResearch

Positioning / 项目定位

Capability Boundaries / 能力边界

Install / 安装

Configuration / 配置准备

Run / 运行

MCP Adapters / MCP 适配器

Output Layout / 输出目录说明

Test / 测试

Security / 安全

Agent Notes / 代理说明

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AIResearch

Positioning / 项目定位

Capability Boundaries / 能力边界

Install / 安装

Configuration / 配置准备

Run / 运行

MCP Adapters / MCP 适配器

Output Layout / 输出目录说明

Test / 测试

Security / 安全

Agent Notes / 代理说明

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages