Skip to content
Merged

Dev #11

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,21 @@ DOC2X_API_KEY=sk-xxx npm run build && npm start

- Zod URL validation should use `z.url()` (for example via `.pipe(z.url())`) instead of deprecated `z.string().url()`.

## Release Checklist

When bumping the package version, always update all three of the following together:

1. **`package.json`** — update `"version"` field.
2. **`CHANGELOG.md`** — add a new section `## [x.y.z] - YYYY-MM-DD` with a summary of changes. Move items from `Unreleased` if applicable.
3. **`README.md` / `README_EN.md`** — if any tool names, parameters, env vars, or workflows changed, sync the relevant sections.

After releasing a new version, remind users to re-run the Skill install command to pick up the latest tool descriptions:

```bash
# One-command update (no clone needed)
curl -fsSL https://raw.githubusercontent.com/NoEdgeAI/doc2x-mcp/main/scripts/install-skill.sh | sh
```

## Commit & Pull Request Guidelines

- Use Conventional Commits style (e.g., `feat: ...`, `fix: ...`, `docs: ...`).
Expand Down
39 changes: 39 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Changelog

All notable changes to this project will be documented in this file.

## [0.1.4] - Unreleased

- feat: add project icon (`icon.png`)
- chore: upgrade `@modelcontextprotocol/sdk` to fix vulnerabilities
- feat: add display page support

## [0.1.3] - 2026-02-28

- feat: add v3-2026 parse model support (`doc2x_parse_pdf_submit`, `doc2x_parse_pdf_wait_text`)
- feat: add `doc2x_materialize_pdf_layout_json` tool for v3 layout JSON materialization
- feat: restructure source packages for better maintainability
- fix: support explicit `v2` parse model parameter

## [0.1.2] - 2026-01-19

- feat: add Skill installation scripts (Bash, PowerShell 7+, Windows PowerShell 5.1)
- fix: install skill shell script issues
- fix: update skill installation category from `local` to `public`
- fix: restrict `doc2x_parse_pdf_status` response to status fields only
- chore: streamline CI workflow

## [0.1.1] - 2026-01-17

- feat: cap parse output via `DOC2X_PARSE_PDF_MAX_OUTPUT_CHARS` and `DOC2X_PARSE_PDF_MAX_OUTPUT_PAGES`
- feat: improve developer ergonomics for MCP tools
- ci: set up GitHub Actions publish and build workflows

## [0.1.0] - Initial release

- feat: initial Doc2x MCP server implementation
- feat: PDF parse tools (`submit` / `status` / `wait_text`)
- feat: export tools (`submit` / `result` / `wait`)
- feat: image layout parse tools (sync / async)
- feat: download tools (`download_url_to_file`, `materialize_convert_zip`)
- feat: `doc2x_debug_config` diagnostics tool
43 changes: 40 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# Doc2x MCP Server

<p align="center">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="120" height="120" viewBox="0 0 120 120">
<defs>
<clipPath id="r">
<rect width="120" height="120" rx="24" ry="24"/>
</clipPath>
</defs>
<image href="./icon.png" width="120" height="120" clip-path="url(#r)"/>
</svg>
</p>

[![CI](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/ci.yml/badge.svg)](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/ci.yml)
[![Publish](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/publish.yml/badge.svg)](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/publish.yml)
[![npm version](https://img.shields.io/npm/v/%40noedgeai-org%2Fdoc2x-mcp)](https://www.npmjs.com/package/@noedgeai-org/doc2x-mcp)
Expand All @@ -22,6 +33,7 @@
- [安装本仓库 Skill(可选)](#安装本仓库-skill可选)
- [安全与排错](#安全与排错)
- [问题反馈](#问题反馈)
- [Changelog](./CHANGELOG.md)
- [License](#license)

## 项目定位
Expand Down Expand Up @@ -93,7 +105,7 @@ MCP client 指向本地构建产物:

| 阶段 | Tools | 说明 |
| --- | --- | --- |
| PDF 解析 | `doc2x_parse_pdf_submit` / `doc2x_parse_pdf_status` / `doc2x_parse_pdf_wait_text` | 提交任务、查询状态、等待并取文本 |
| PDF 解析 | `doc2x_parse_pdf_submit` / `doc2x_parse_pdf_status` / `doc2x_parse_pdf_wait_text` / `doc2x_materialize_pdf_layout_json` | 提交任务、查询状态、等待并取文本,或将 v3 layout 结果落盘为本地 JSON |
| 结果导出 | `doc2x_convert_export_submit` / `doc2x_convert_export_result` / `doc2x_convert_export_wait` | 发起导出、查结果、等待导出完成 |
| 下载落盘 | `doc2x_download_url_to_file` / `doc2x_materialize_convert_zip` | 下载 URL 到本地、解包 convert zip |
| 图片版面解析 | `doc2x_parse_image_layout_sync` / `doc2x_parse_image_layout_submit` / `doc2x_parse_image_layout_status` / `doc2x_parse_image_layout_wait_text` | 同步/异步图片 OCR 与版面解析 |
Expand All @@ -102,7 +114,7 @@ MCP client 指向本地构建产物:
### PDF 解析模型(`doc2x_parse_pdf_submit` / `doc2x_parse_pdf_wait_text`)

- 可选参数:`model`
- 可选值:`v3-2026`(最新模型)
- 可选值:`v2`(默认) / `v3-2026`(最新模型)
- 不传时默认 `v2`

```json
Expand All @@ -111,6 +123,23 @@ MCP client 指向本地构建产物:
}
```

### PDF Layout JSON 落盘(`doc2x_materialize_pdf_layout_json`)

- 必选参数:`output_path`
- `uid` 与 `pdf_path` 二选一
- `v2` 不支持 `layout`;需要 `pages[].layout` 时请使用 `v3-2026`
- 若传 `pdf_path` 但不传 `model`,该工具默认使用 `v3-2026`
- 成功时将原始 `result` JSON 写到本地

`layout` 是页面块结构和坐标信息,适合 figure/table 裁剪、区域高亮、结构化抽取和版面分析;如果只想看正文内容,优先使用 Markdown / DOCX 导出。

```json
{
"pdf_path": "/absolute/path/to/input.pdf",
"output_path": "/absolute/path/to/input_v3.layout.json"
}
```

### 导出公式参数(`doc2x_convert_export_submit` / `doc2x_convert_export_wait`)

- 必选参数:`formula_mode`(`normal` / `dollar`)
Expand All @@ -134,6 +163,12 @@ MCP client 指向本地构建产物:
1. `doc2x_parse_image_layout_sync` 直接同步解析。
2. 若需要稳态轮询,改用 submit/status/wait 组合。

### 工作流 3:PDF -> v3 layout JSON 本地文件

1. 调用 `doc2x_materialize_pdf_layout_json`,传入 `pdf_path` 和 `output_path`。
2. 工具会等待 parse 成功,并将原始 `result` JSON 落到本地。
3. 该 JSON 可直接提供给后续 figure/table 裁剪脚本使用。

## 本地开发

### 环境要求
Expand Down Expand Up @@ -191,7 +226,9 @@ pnpm audit --prod --audit-level high

## 安装本仓库 Skill(可选)

用于给 Codex CLI / Claude Code 增加一个“教大模型如何使用 doc2x-mcp tools 的 Skill”。
用于给 Codex CLI / Claude Code 增加一个"教大模型如何使用 doc2x-mcp tools 的 Skill"。

> **提示:** 每次升级 `doc2x-mcp` 版本后,建议重新运行安装命令以更新 Skill,确保大模型使用最新的 tool 描述与工作流。

不需要 clone 仓库的一键安装(推荐):

Expand Down
41 changes: 39 additions & 2 deletions README_EN.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# Doc2x MCP Server

<p align="center">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="120" height="120" viewBox="0 0 120 120">
<defs>
<clipPath id="r">
<rect width="120" height="120" rx="24" ry="24"/>
</clipPath>
</defs>
<image href="./icon.png" width="120" height="120" clip-path="url(#r)"/>
</svg>
</p>

[![CI](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/ci.yml/badge.svg)](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/ci.yml)
[![Publish](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/publish.yml/badge.svg)](https://github.com/NoEdgeAI/doc2x-mcp/actions/workflows/publish.yml)
[![npm version](https://img.shields.io/npm/v/%40noedgeai-org%2Fdoc2x-mcp)](https://www.npmjs.com/package/@noedgeai-org/doc2x-mcp)
Expand All @@ -22,6 +33,7 @@ A stdio-based MCP Server that wraps Doc2x v2 PDF/image capabilities into stable,
- [Install Repo Skill (Optional)](#install-repo-skill-optional)
- [Security and Troubleshooting](#security-and-troubleshooting)
- [Getting Help](#getting-help)
- [Changelog](./CHANGELOG.md)
- [License](#license)

## Project Scope
Expand Down Expand Up @@ -93,7 +105,7 @@ Point MCP client to your local build output:

| Stage | Tools | Purpose |
| --- | --- | --- |
| PDF parse | `doc2x_parse_pdf_submit` / `doc2x_parse_pdf_status` / `doc2x_parse_pdf_wait_text` | Submit parse tasks, check status, wait and fetch text |
| PDF parse | `doc2x_parse_pdf_submit` / `doc2x_parse_pdf_status` / `doc2x_parse_pdf_wait_text` / `doc2x_materialize_pdf_layout_json` | Submit parse tasks, check status, wait and fetch text, or materialize v3 layout JSON locally |
| Export | `doc2x_convert_export_submit` / `doc2x_convert_export_result` / `doc2x_convert_export_wait` | Start export, read export result, wait for completion |
| Download | `doc2x_download_url_to_file` / `doc2x_materialize_convert_zip` | Download export URL to local path, materialize convert zip |
| Image layout parse | `doc2x_parse_image_layout_sync` / `doc2x_parse_image_layout_submit` / `doc2x_parse_image_layout_status` / `doc2x_parse_image_layout_wait_text` | Sync/async OCR and layout parse for images |
Expand All @@ -102,7 +114,7 @@ Point MCP client to your local build output:
### PDF Parse Model (`doc2x_parse_pdf_submit` / `doc2x_parse_pdf_wait_text`)

- Optional parameter: `model`
- Supported value: `v3-2026` (latest model)
- Supported values: `v2` (default) / `v3-2026` (latest model)
- Default (when omitted): `v2`

```json
Expand All @@ -111,6 +123,23 @@ Point MCP client to your local build output:
}
```

### PDF Layout JSON Materialization (`doc2x_materialize_pdf_layout_json`)

- Required: `output_path`
- Provide either `uid` or `pdf_path`
- `v2` does not support `layout`; use `v3-2026` when `pages[].layout` is required
- When `pdf_path` is used and `model` is omitted, this tool defaults to `v3-2026`
- On success it writes the raw parse `result` JSON locally

`layout` contains page block structure and coordinates, which is useful for figure/table crops, region highlighting, structured extraction, and layout analysis. If the goal is readable full text, prefer Markdown / DOCX export.

```json
{
"pdf_path": "/absolute/path/to/input.pdf",
"output_path": "/absolute/path/to/input_v3.layout.json"
}
```

### Export Formula Parameters (`doc2x_convert_export_submit` / `doc2x_convert_export_wait`)

- Required: `formula_mode` (`normal` / `dollar`)
Expand All @@ -134,6 +163,12 @@ Point MCP client to your local build output:
1. Use `doc2x_parse_image_layout_sync` for direct parse.
2. For robust polling behavior, switch to submit/status/wait flow.

### Workflow 3: PDF -> local v3 layout JSON

1. Call `doc2x_materialize_pdf_layout_json` with `pdf_path` and `output_path`.
2. The tool waits for parse success and writes the raw `result` JSON locally.
3. The saved JSON can be consumed directly by downstream figure/table crop scripts.

## Local Development

### Requirements
Expand Down Expand Up @@ -193,6 +228,8 @@ pnpm audit --prod --audit-level high

Installs a reusable skill for Codex CLI / Claude Code to guide tool usage with the standard `submit/status/wait/export/download` workflow.

> **Note:** After upgrading `doc2x-mcp` to a new version, re-run the install command to update the Skill and ensure the model uses the latest tool descriptions and workflows.

One-command install without cloning (recommended):

```bash
Expand Down
Binary file added icon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 5 additions & 4 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@noedgeai-org/doc2x-mcp",
"version": "0.1.3",
"version": "0.1.4",
"description": "Doc2x MCP server (stdio, MCP SDK).",
"license": "MIT",
"engines": {
Expand All @@ -21,7 +21,8 @@
"./scripts/install-skill.ps1",
"./scripts/install-skill-winps.ps1",
"./skills/doc2x-mcp/SKILL.md",
"./package.json"
"./package.json",
"./icon.png"
],
"scripts": {
"build": "node ./node_modules/typescript/bin/tsc -p tsconfig.json",
Expand All @@ -31,13 +32,13 @@
"skill:install:ps": "pwsh -NoProfile -ExecutionPolicy Bypass -File scripts/install-skill.ps1",
"skill:install:winps": "powershell -NoProfile -ExecutionPolicy Bypass -File scripts/install-skill-winps.ps1",
"start": "node dist/index.js",
"test:unit": "npm run build && node --test test/unit/registerToolsShared.test.js",
"test:unit": "npm run build && node --test test/unit/registerToolsShared.test.js test/unit/materialize.test.js",
"test:e2e": "npm run build && node --test test/e2e/mcpServer.e2e.test.js",
"test": "npm run test:unit && npm run test:e2e",
"prepublishOnly": "pnpm run build"
},
"dependencies": {
"@modelcontextprotocol/sdk": "1.26.0",
"@modelcontextprotocol/sdk": "1.27.1",
"@types/lodash": "^4.17.23",
"lodash": "4.17.23",
"lru-cache": "^11.2.6",
Expand Down
Loading
Loading