English / 中文
doc2ai is a Claude Code plugin for converting office documents into AI-friendly text formats. It focuses on preserving source structure while removing format noise, so downstream AI agents and scripts can inspect requirements, designs, spreadsheets, and other enterprise documents more reliably.
claude plugin marketplace add https://github.com/IronRookieCoder/doc2ai
claude plugin install doc2ai/doc2ai:docs2md input.docx
/doc2ai:docs2md input.doc -o md/
/doc2ai:docs2md docs/ --report
/doc2ai:docs2md input.docx --config custom.yaml
The docs2md skill converts .doc and .docx files into structured Markdown. It uses a two-stage pipeline:
doc/docx
-> script conversion and cleanup (Pandoc + Lua filter + regex)
-> targeted AI formatting repair (only risk-flagged regions)
-> final Markdown
| Parameter | Description | Default |
|---|---|---|
-o <dir> |
Markdown output directory | md/ |
--config <path> |
Path to configuration file | config.yaml in the skill directory |
--report |
Generate a JSON conversion report | not generated |
/doc2ai:md2ai input.md
/doc2ai:md2ai md/ -o ai-native/
/doc2ai:md2ai input.md --threshold 800
/doc2ai:md2ai input.md --force
The md2ai skill splits Markdown files longer than the configured threshold (default 500 lines) into a main TOC entry plus focused child documents. Output goes to the current directory by default. During processing it uses a risk index so AI verifies only risky child documents; process JSON files are removed before final delivery.
| Parameter | Description | Default |
|---|---|---|
-o <dir> |
Output directory | current directory . |
--threshold <n> |
Line count above which a file is treated as long | 500 |
--max-lines-per-doc <n> |
Target maximum lines per child document | 500 |
--force |
Generate TOC + child structure even below the threshold | off |
--keep-process-files |
Keep manifest.json, risk-index.json, and summary.json |
not kept |
/doc2ai:xlsx2csv report.xlsx
/doc2ai:xlsx2csv data/ -o csv/
The xlsx2csv skill converts .xlsx files into an index CSV plus one CSV file per worksheet. It preserves the original grid layout and avoids semantic normalization.
| Parameter | Description | Default |
|---|---|---|
-o <dir> |
CSV output directory | csv/ |
| Skill | Command | Description |
|---|---|---|
docs2md |
/doc2ai:docs2md |
Convert .doc / .docx documents into structured Markdown |
md2ai |
/doc2ai:md2ai |
Split long Markdown into an AI Native TOC and child documents |
xlsx2csv |
/doc2ai:xlsx2csv |
Convert .xlsx workbooks into AI-friendly CSV collections |
- Pandoc must be installed and available in
PATH - Python 3
pyyaml- WPS or a compatible local conversion environment is recommended for legacy
.docfiles
- Python 3
pandaspython-calaminepyyaml
Install missing Python dependencies when needed:
pip install pandas python-calamine pyyamlmd/
└── document.md
For .doc inputs, an intermediate .docx file may be generated and retained beside the original file.
When --report is used, conversion reports are written under:
md/
└── reports/
└── document.json
Output goes to the current directory by default, creating a subdirectory named after the document:
./
└── document/
├── document.md
├── User requirements.md
├── Functional requirements.md
└── Non-functional requirements.md
When -o ai-native/ is specified:
ai-native/
└── document/
├── document.md
├── User requirements.md
├── Functional requirements.md
└── Non-functional requirements.md
Batch processing preserves relative input subdirectories. manifest.json, risk-index.json, and summary.json are process files used during verification and removed before final delivery.
csv/
└── workbook/
├── workbook.csv
├── Sheet1.csv
└── Sheet2.csv
The workbook-level CSV is an index file that records worksheet order, worksheet name, exported file name, and used range.
- Preserve source content and avoid adding conclusions not present in the original file
- Prefer structural cleanup over visual layout restoration
- Keep original spreadsheet grids, including blank cells, blank rows, and blank columns
- Do not infer spreadsheet headers or normalize rows
- Remove conversion noise such as empty anchors, image remnants, Pandoc annotations, and invalid table formatting when clearly safe
- Keep suspicious content for human review instead of deleting it by default
.claude-plugin/
├── plugin.json
└── marketplace.json
skills/
├── docs2md/
│ ├── SKILL.md
│ ├── config.yaml
│ ├── scripts/
│ └── references/
├── md2ai/
│ ├── SKILL.md
│ └── scripts/
└── xlsx2csv/
├── SKILL.md
├── config.yaml
└── scripts/
- Directory input is supported for all three skills.
- Batch conversion preserves relative subdirectories to avoid filename collisions.
- Office temporary files starting with
~$are skipped. - Chinese paths and filenames are supported by the bundled scripts.
xlsx2csvskips hidden worksheets by default. Setsheet.include_hidden_sheets: trueinconfig.yamlto export them.- When converting
.docfiles,docs2mdskips the.docif a matching.docxalready exists in the same directory to avoid duplicate outputs.