⚡ Completely built with Pochi
DocFlow is a CLI tool that fetches documents from Lark (Feishu), Notion, or Google Docs, converts them into a structured AST, uploads all embedded images/assets to S3 (or local disk), and outputs production-ready .mdx files — ready to drop into any Next.js, Docusaurus, or MDX-powered docs site.
┌─────────────┐ fetch ┌─────────┐ upload ┌──────────┐ write ┌──────────┐
│ Lark/Notion │ ─────────────► │ AST │ ───────────► │ S3 / FS │ ──────────► │ .mdx │
│ Google Docs │ │ (typed) │ │ (assets) │ │ (output) │
└─────────────┘ └─────────┘ └──────────┘ └──────────┘
- 3 source adapters — Lark (Feishu), Notion, and Google Docs
- 2 asset backends — AWS S3 (and S3-compatible: R2, MinIO) or local disk
- Full MDX output — frontmatter, headings, code blocks, tables, callouts, images
- Concurrent asset uploads — configurable parallelism with retries
- Git integration — optional auto-commit after publish
- Dry-run mode — inspect the parsed AST without writing any files
- Env-variable overrides — no secrets ever need to be hardcoded in config
- Zero runtime deps beyond Node.js — runs on any machine with Node 18+
- Installation
- Quick Start
- Configuration
- CLI Reference
- Environment Variables
- Output Format
- Architecture
- Contributing
- License
# Clone and install
git clone https://github.com/YOUR_USERNAME/docflow.git
cd docflow
npm install
# Build
npm run build
# Link globally (optional)
npm linkOr use directly via npx once published:
npx docflow fetch <docId>1. Copy the example config:
cp docflow.config.yaml my-project/docflow.config.yaml2. Fill in your credentials (or set environment variables — see below):
adapter: lark
lark:
appId: "YOUR_LARK_APP_ID"
appSecret: "YOUR_LARK_APP_SECRET"
assets:
backend: s3
s3:
bucket: "my-docs-bucket"
region: "us-east-1"
output:
dir: "./content/docs"3. Fetch a document:
docflow fetch <your-lark-doc-id>That's it — your .mdx file is written to ./content/docs/ with all images uploaded and replaced with permanent S3 URLs.
DocFlow is configured via docflow.config.yaml in your project root. Every value can also be overridden with environment variables (see Environment Variables).
Create a custom app in the Lark Open Platform (open.feishu.cn):
- Go to My Apps → Create App
- Enable the Docs API permissions:
docx:document:readonly,drive:drive:readonly - Copy your App ID and App Secret
lark:
appId: "YOUR_LARK_APP_ID" # or env DOCFLOW_LARK_APP_ID
appSecret: "YOUR_LARK_APP_SECRET" # or env DOCFLOW_LARK_APP_SECRETThe document must be shared with the app (or the app must have tenant-wide read access).
Create an internal integration at notion.so/my-integrations:
- Click New Integration, pick a name, select your workspace
- Copy the Integration Token (starts with
secret_...) - Share each page/database with your integration via the Share menu in Notion
notion:
apiKey: "secret_..." # or env NOTION_API_KEYUses a Service Account (no OAuth, no user login required):
- In Google Cloud Console, create a service account
- Give it no project roles (it just needs Docs API access)
- Download the JSON key file
- Enable the Google Docs API and Google Drive API in your project
- Share your document with the service account's
client_email
googleDocs:
serviceAccountPath: "./service-account.json" # or env GOOGLE_APPLICATION_CREDENTIALS
⚠️ Never commit yourservice-account.jsonto git. It is already in.gitignore.
assets:
backend: s3
concurrency: 5 # max parallel uploads
s3:
bucket: "YOUR_S3_BUCKET_NAME"
region: "us-east-1"
keyPrefix: "docs/" # optional prefix for all keys
publicBaseUrl: "https://cdn.example.com" # optional CDN base URL
# endpoint: "https://..." # optional: R2, MinIO, etc.AWS credentials are picked up from environment variables (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY) or from the standard AWS credential chain (IAM role, ~/.aws/credentials, etc.).
assets:
backend: local
local:
outputDir: "./public/assets"
publicBaseUrl: "/assets"DocFlow can auto-commit the output .mdx files after publishing:
git:
autoCommit: false
commitMessage: "docs: publish {{title}} ({{date}})"Or pass --commit at the CLI to commit a specific run without changing the config.
Usage: docflow [command] [options]
Commands:
fetch <docId> Fetch a document and publish it as MDX
adapters List registered source adapters
backends List registered asset backends
Options for `fetch`:
--config <path> Path to docflow.config.yaml (default: ./docflow.config.yaml)
--adapter <name> Source adapter: lark | notion | google-docs
--backend <name> Asset backend: s3 | local
--output <dir> Output directory for .mdx files
--overwrite Overwrite existing .mdx files (default: false)
--commit Auto-commit output files via git
--dry-run Parse and show summary without writing files or uploading assets
# Fetch a Lark doc
docflow fetch ABC123XYZ
# Dry-run: inspect the parsed AST without any side-effects
docflow fetch ABC123XYZ --dry-run
# Use a different config file
docflow fetch ABC123XYZ --config ./configs/prod.yaml
# Override adapter and backend on the fly
docflow fetch ABC123XYZ --adapter notion --backend local --output ./out
# Fetch and auto-commit the result
docflow fetch ABC123XYZ --commit
# List what adapters are registered
docflow adapters
# List asset backends
docflow backendsAll credentials can (and should) be provided via environment variables instead of the config file:
| Variable | Description |
|---|---|
DOCFLOW_LARK_APP_ID |
Lark app ID |
DOCFLOW_LARK_APP_SECRET |
Lark app secret |
NOTION_API_KEY |
Notion internal integration token |
GOOGLE_APPLICATION_CREDENTIALS |
Path to Google service account JSON |
AWS_ACCESS_KEY_ID |
AWS access key ID |
AWS_SECRET_ACCESS_KEY |
AWS secret access key |
DOCFLOW_S3_BUCKET |
S3 bucket name (overrides config) |
DOCFLOW_S3_REGION |
S3 region (overrides config) |
DOCFLOW_S3_KEY_PREFIX |
S3 key prefix (overrides config) |
DOCFLOW_S3_PUBLIC_BASE_URL |
S3 / CDN public base URL (overrides config) |
Example .env file (never commit this):
DOCFLOW_LARK_APP_ID=cli_xxxxxxxxxxxx
DOCFLOW_LARK_APP_SECRET=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
DOCFLOW_S3_BUCKET=my-docs-assets
DOCFLOW_S3_REGION=us-east-1Each fetched document produces a single .mdx file:
---
title: My Document Title
slug: my-document-title
date: '2026-01-15'
---
Content paragraph here...
## Heading 2

| Column A | Column B |
|---|---|
| Value 1 | Value 2 |- Frontmatter —
title,slug,date - Headings — H1–H6
- Text formatting — bold, italic, strikethrough, inline code
- Code blocks — with language detection
- Images — uploaded to your asset backend; URLs replaced with permanent links
- Tables — GFM markdown tables
- Callouts / quotes — blockquote format
- Lists — ordered and unordered, nested
src/
├── cli.ts # CLI entry point (Commander.js)
├── core/
│ ├── config.ts # YAML config loader + env overrides
│ ├── pipeline.ts # Main orchestration: fetch → upload → publish
│ ├── registry.ts # Plugin registry for adapters/backends/publishers
│ └── ast.ts # Shared AST node types
├── adapters/
│ ├── ISourceAdapter.ts # Adapter interface
│ ├── lark/ # Lark (Feishu) adapter
│ ├── notion/ # Notion adapter
│ └── google-docs/ # Google Docs adapter
├── assets/
│ ├── AssetManager.ts # Concurrent upload orchestration
│ ├── IAssetBackend.ts # Backend interface
│ └── backends/
│ ├── S3Backend.ts # AWS S3 / S3-compatible
│ └── LocalDiskBackend.ts # Local filesystem
├── publishers/
│ ├── IPublisher.ts # Publisher interface
│ └── MdxPublisher.ts # MDX serializer
└── git/
└── GitIntegration.ts # simple-git auto-commit
Implement ISourceAdapter and register it in cli.ts:
import type { ISourceAdapter, DocflowDocument } from "./adapters/ISourceAdapter.js";
class MyAdapter implements ISourceAdapter {
info = { name: "my-adapter", version: "1.0.0" };
async authenticate() { /* ... */ }
async fetchDocument(id: string): Promise<DocflowDocument> { /* ... */ }
async resolveAssetUrl(rawUrl: string): Promise<string> { return rawUrl; }
}Implement IAssetBackend:
import type { IAssetBackend } from "./assets/IAssetBackend.js";
class MyBackend implements IAssetBackend {
async upload(buffer: Buffer, filename: string, mimeType: string): Promise<string> {
// upload and return public URL
}
}Contributions are welcome! Please open an issue first to discuss larger changes.
# Install dependencies
npm install
# Build
npm run build
# Watch mode during development
npm run build:watch
# Run directly from source (no build needed)
npm run dev -- fetch <docId>ISC License — see LICENSE for details.