doc2ai: Document Conversion Plugin for AI Workflows

English / 中文

doc2ai is a Claude Code plugin for converting office documents into AI-friendly text formats. It focuses on preserving source structure while removing format noise, so downstream AI agents and scripts can inspect requirements, designs, spreadsheets, and other enterprise documents more reliably.

Installation

claude plugin marketplace add https://github.com/IronRookieCoder/doc2ai
claude plugin install doc2ai

Usage

Convert Word Documents to Markdown

/doc2ai:docs2md input.docx
/doc2ai:docs2md input.doc -o md/
/doc2ai:docs2md docs/ --report
/doc2ai:docs2md input.docx --config custom.yaml

The docs2md skill converts .doc and .docx files into structured Markdown. It uses a two-stage pipeline:

doc/docx
  -> script conversion and cleanup (Pandoc + Lua filter + regex)
  -> targeted AI formatting repair (only risk-flagged regions)
  -> final Markdown

Parameter	Description	Default
`-o <dir>`	Markdown output directory	`md/`
`--config <path>`	Path to configuration file	`config.yaml` in the skill directory
`--report`	Generate a JSON conversion report	not generated

Convert Long Markdown to AI Native Structure

/doc2ai:md2ai input.md
/doc2ai:md2ai md/ -o ai-native/
/doc2ai:md2ai input.md --threshold 800
/doc2ai:md2ai input.md --force

The md2ai skill splits Markdown files longer than the configured threshold (default 500 lines) into a main TOC entry plus focused child documents. Output goes to the current directory by default. During processing it uses a risk index so AI verifies only risky child documents; process JSON files are removed before final delivery.

Parameter	Description	Default
`-o <dir>`	Output directory	current directory `.`
`--threshold <n>`	Line count above which a file is treated as long	`500`
`--max-lines-per-doc <n>`	Target maximum lines per child document	`500`
`--force`	Generate TOC + child structure even below the threshold	off
`--keep-process-files`	Keep `manifest.json`, `risk-index.json`, and `summary.json`	not kept

Convert Spreadsheets to CSV

/doc2ai:xlsx2csv report.xlsx
/doc2ai:xlsx2csv data/ -o csv/

The xlsx2csv skill converts .xlsx files into an index CSV plus one CSV file per worksheet. It preserves the original grid layout and avoids semantic normalization.

Parameter	Description	Default
`-o <dir>`	CSV output directory	`csv/`

Skills

Skill	Command	Description
`docs2md`	`/doc2ai:docs2md`	Convert `.doc` / `.docx` documents into structured Markdown
`md2ai`	`/doc2ai:md2ai`	Split long Markdown into an AI Native TOC and child documents
`xlsx2csv`	`/doc2ai:xlsx2csv`	Convert `.xlsx` workbooks into AI-friendly CSV collections

Dependencies

docs2md

Pandoc must be installed and available in PATH
Python 3
pyyaml
WPS or a compatible local conversion environment is recommended for legacy .doc files

xlsx2csv

Python 3
pandas
python-calamine
pyyaml

Install missing Python dependencies when needed:

pip install pandas python-calamine pyyaml

Output Structure

Markdown Output

md/
└── document.md

For .doc inputs, an intermediate .docx file may be generated and retained beside the original file.

When --report is used, conversion reports are written under:

md/
└── reports/
    └── document.json

AI Native Markdown Output

Output goes to the current directory by default, creating a subdirectory named after the document:

./
└── document/
    ├── document.md
    ├── User requirements.md
    ├── Functional requirements.md
    └── Non-functional requirements.md

When -o ai-native/ is specified:

ai-native/
└── document/
    ├── document.md
    ├── User requirements.md
    ├── Functional requirements.md
    └── Non-functional requirements.md

Batch processing preserves relative input subdirectories. manifest.json, risk-index.json, and summary.json are process files used during verification and removed before final delivery.

CSV Output

csv/
└── workbook/
    ├── workbook.csv
    ├── Sheet1.csv
    └── Sheet2.csv

The workbook-level CSV is an index file that records worksheet order, worksheet name, exported file name, and used range.

Conversion Principles

Preserve source content and avoid adding conclusions not present in the original file
Prefer structural cleanup over visual layout restoration
Keep original spreadsheet grids, including blank cells, blank rows, and blank columns
Do not infer spreadsheet headers or normalize rows
Remove conversion noise such as empty anchors, image remnants, Pandoc annotations, and invalid table formatting when clearly safe
Keep suspicious content for human review instead of deleting it by default

Directory Structure

.claude-plugin/
├── plugin.json
└── marketplace.json
skills/
├── docs2md/
│   ├── SKILL.md
│   ├── config.yaml
│   ├── scripts/
│   └── references/
├── md2ai/
│   ├── SKILL.md
│   └── scripts/
└── xlsx2csv/
    ├── SKILL.md
    ├── config.yaml
    └── scripts/

Notes

Directory input is supported for all three skills.
Batch conversion preserves relative subdirectories to avoid filename collisions.
Office temporary files starting with ~$ are skipped.
Chinese paths and filenames are supported by the bundled scripts.
xlsx2csv skips hidden worksheets by default. Set sheet.include_hidden_sheets: true in config.yaml to export them.
When converting .doc files, docs2md skips the .doc if a matching .docx already exists in the same directory to avoid duplicate outputs.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.claude-plugin		.claude-plugin
skills		skills
.gitignore		.gitignore
README.md		README.md
README_CN.md		README_CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

doc2ai: Document Conversion Plugin for AI Workflows

Installation

Usage

Convert Word Documents to Markdown

Convert Long Markdown to AI Native Structure

Convert Spreadsheets to CSV

Skills

Dependencies

docs2md

xlsx2csv

Output Structure

Markdown Output

AI Native Markdown Output

CSV Output

Conversion Principles

Directory Structure

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

doc2ai: Document Conversion Plugin for AI Workflows

Installation

Usage

Convert Word Documents to Markdown

Convert Long Markdown to AI Native Structure

Convert Spreadsheets to CSV

Skills

Dependencies

docs2md

xlsx2csv

Output Structure

Markdown Output

AI Native Markdown Output

CSV Output

Conversion Principles

Directory Structure

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages