English | 中文
✨ Focus on paper multimodal workflows: from paper PDFs/screenshots/text to one-click generation of model diagrams, technical roadmaps, experimental plots, and slide decks ✨
| 📄 Universal File Support | 🎯 AI-Powered Generation | 🎨 Custom Styling | ⚡ Lightning Speed |
- 🔥 News
- ✨ Core Features
- 📸 Showcase
- 🧩 Drawio
- 🚀 Quick Start
- 📂 Project Structure
- 🗺️ Roadmap
- 🤝 Contributing
Tip
🆕 2026-02-02 · Paper2Rebuttal
Added rebuttal drafting support with structured response guidance and image-aware revision prompts.
Tip
🆕 2026-01-28 · Drawio Update
Added Drawio support for visual diagram creation and showcase-ready outputs in the workflow.
KB updates in one line: multi-file PPT generation with doc convert/merge, optional image injection, and embedding-assisted retrieval.
Tip
🆕 2026-01-25 · New Features
Added AI-assisted outline editing, three-layer model configuration system for flexible model selection, and user points management with daily quota allocation.
🌐 Online Demo: http://dcai-paper2any.nas.cpolar.cn/
Tip
🆕 2026-01-20 · Bug Fixes
Fixed bugs in experimental plot generation (image/text) and resolved the missing historical files issue.
🌐 Online Demo: http://dcai-paper2any.nas.cpolar.cn/
Tip
🆕 2026-01-14 · Feature Updates & Backend Architecture Upgrade
- Feature Updates: Added Image2PPT, optimized Paper2Figure interaction, and improved PDF2PPT effects.
- Standardized API: Refactored backend interfaces with RESTful
/api/v1/structure, removing obsolete endpoints for better maintainability. - Dynamic Configuration: Supported dynamic model selection (e.g., GPT-4o, Qwen-VL) via API parameters, eliminating hardcoded model dependencies.
🌐 Online Demo: http://dcai-paper2any.nas.cpolar.cn/
- 2025-12-12 · Paper2Figure Web public beta is live
- 2025-10-01 · Released the first version
0.1.0
From paper PDFs / images / text to editable scientific figures, slide decks, video scripts, academic posters, and other multimodal content in one click.
Paper2Any currently includes the following sub-capabilities:
- 📊 Paper2Figure - Editable Scientific Figures: Model architecture diagrams, technical roadmaps (PPT + SVG), and experimental plots with editable PPTX output.
- 🧩 Paper2Diagram / Image2Drawio - Editable Diagrams: Generate draw.io diagrams from paper/text or images, with drawio/png/svg export and chat-based edits.
- 🎬 Paper2PPT - Editable Slide Decks: Paper/text/topic to PPT, long-doc support, and built-in table/figure extraction.
- 📝 Paper2Rebuttal: Draft structured rebuttals and revision responses with claims-to-evidence grounding.
- 🖼️ PDF2PPT - Layout-Preserving Conversion: Accurate layout retention for PDF → editable PPTX.
- 🖼️ Image2PPT - Image to Slides: Convert images or screenshots into structured slides.
- 🎨 PPTPolish - Smart Beautification: AI-based layout optimization and style transfer.
- 🎬 Paper2Video: Generate video scripts and narration assets.
- 📝 Paper2Technical: Produce technical reports and method summaries.
- 📚 Knowledge Base (KB): Ingest/embedding, semantic search, and KB-driven PPT/podcast/mindmap generation.
✨ Diagram generation (mindmap / flowchart / ER ...)
✨ Model diagrams from PDF or text (research figure generation)
✨ Model Architecture Diagram Generation
✨ Model Architecture Diagram Generation
✨ PPT Generation Demo
✨ Paper / Text / Topic → PPT
✨ Long Document Support (40+ Slides)
🐳 Docker (Recommended) — Deployment & Updates
# 1. Clone
git clone https://github.com/OpenDCAI/Paper2Any.git
cd Paper2Any
# 2. Backend env (required for your API keys/models)
cp fastapi_app/.env.example fastapi_app/.env
# 3. Build + run
docker compose up -d --buildOpen:
- Frontend: http://localhost:3000
- Backend health: http://localhost:8000/health
Modify & update:
- After changing code or
.env, rebuild:docker compose up -d --build - Pull latest code and rebuild:
git pulldocker compose up -d --build
Common commands:
- View logs:
docker compose logs -f - Stop services:
docker compose down
Notes:
- The first build may take a while (system deps + Python deps).
- Frontend env is baked at build time (compose build args). If you change it, rebuild with
docker compose up -d --build. - Outputs/models are mounted to the host (
./outputs,./models) for persistence.
We recommend using Conda to create an isolated environment (Python 3.11).
# 0. Create and activate a conda environment
conda create -n paper2any python=3.11 -y
conda activate paper2any
# 1. Clone repository
git clone https://github.com/OpenDCAI/Paper2Any.git
cd Paper2Any
# 2. Install base dependencies
pip install -r requirements-base.txt
# 3. Install in editable (dev) mode
pip install -e .Paper2Any involves LaTeX rendering, vector graphics processing as well as PPT/PDF conversion, which require extra dependencies:
# 1. Python dependencies
pip install -r requirements-paper.txt || pip install -r requirements-paper-backup.txt
# 2. LaTeX engine (tectonic) - recommended via conda
conda install -c conda-forge tectonic -y
# 3. Resolve doclayout_yolo dependency conflicts (Important)
pip install doclayout_yolo --no-deps
# 4. System dependencies (Ubuntu example)
sudo apt-get update
sudo apt-get install -y inkscape libreoffice poppler-utils wkhtmltopdfexport DF_API_KEY=your_api_key_here
export DF_API_URL=xxx # Optional: if you need a third-party API gateway
export MINERU_DEVICES="0,1,2,3" # Optional: MinerU task GPU resource poolTip
📚 For detailed configuration guide, see Configuration Guide for step-by-step instructions on configuring models, environment variables, and starting services.
📝 Click to expand: Detailed .env Configuration Guide
Paper2Any uses two .env files for configuration. Both are optional - you can run the application without them using default settings.
# Copy backend environment file
cp fastapi_app/.env.example fastapi_app/.env
# Copy frontend environment file
cp frontend-workflow/.env.example frontend-workflow/.envSupabase (Optional) - Only needed if you want user authentication and cloud storage:
SUPABASE_URL=https://your-project-id.supabase.co
SUPABASE_ANON_KEY=your_supabase_anon_keyModel Configuration - Customize which models to use for different workflows:
# Default LLM API URL
DEFAULT_LLM_API_URL=http://123.129.219.111:3000/v1/
# Workflow-level defaults
PAPER2PPT_DEFAULT_MODEL=gpt-5.1
PAPER2PPT_DEFAULT_IMAGE_MODEL=gemini-3-pro-image-preview
PDF2PPT_DEFAULT_MODEL=gpt-4o
# ... see .env.example for full listLLM Provider Configuration - Controls the API endpoint dropdown in the UI:
# Default API URL shown in the UI
VITE_DEFAULT_LLM_API_URL=https://api.apiyi.com/v1
# Available API URLs in the dropdown (comma-separated)
VITE_LLM_API_URLS=https://api.apiyi.com/v1,http://b.apiyi.com:16888/v1,http://123.129.219.111:3000/v1What happens when you modify VITE_LLM_API_URLS:
- The frontend will display a dropdown menu with all URLs you specify
- Users can select different API endpoints without manually typing URLs
- Useful for switching between OpenAI, local models, or custom API gateways
Supabase (Optional) - Uncomment these lines if you want user authentication:
VITE_SUPABASE_URL=https://your-project.supabase.co
VITE_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
SUPABASE_JWT_SECRET=your-jwt-secretIf you skip Supabase configuration:
- ✅ All core features work normally
- ✅ CLI scripts work without any configuration
- ❌ No user authentication or quotas
- ❌ No cloud file storage
Note
Quick Start: You can skip the .env configuration entirely and use CLI scripts directly with --api-key parameter. See CLI Scripts section below.
Advanced Configuration: Local Model Service Load Balancing
If you are deploying in a high-concurrency local environment, you can use script/start_model_servers.sh to start a local model service cluster (MinerU / SAM / OCR).
Script location: /DataFlow-Agent/script/start_model_servers.sh
Main configuration items:
-
MinerU (PDF Parsing)
MINERU_MODEL_PATH: Model path (defaultmodels/MinerU2.5-2509-1.2B)MINERU_GPU_UTIL: GPU memory utilization (default 0.2)- Instance configuration: By default, 4 instances are started on GPU 0 and GPU 4 respectively (8 in total), ports 8011-8018.
- Load Balancer: Port 8010, automatically dispatches requests.
-
SAM (Segment Anything Model)
- Instance configuration: By default, 1 instance is started on GPU 2 and GPU 3 respectively, ports 8021-8022.
- Load Balancer: Port 8020.
-
OCR (PaddleOCR)
- Config: Runs on CPU, uses uvicorn's worker mechanism (4 workers by default).
- Port: 8003.
Before using, please modify
gpu_idand the number of instances in the script according to your actual GPU count and memory.
Note
We currently recommend trying Paper2Any on Linux / WSL. If you need to deploy on native Windows, please follow the steps below.
# 0. Create and activate a conda environment
conda create -n paper2any python=3.12 -y
conda activate paper2any
# 1. Clone repository
git clone https://github.com/OpenDCAI/Paper2Any.git
cd Paper2Any
# 2. Install base dependencies
pip install -r requirements-win-base.txt
# 3. Install in editable (dev) mode
pip install -e .Paper2Any involves LaTeX rendering and vector graphics processing, which require extra dependencies (see requirements-paper.txt):
# Python dependencies
pip install -r requirements-paper.txt
# tectonic: LaTeX engine (recommended via conda)
conda install -c conda-forge tectonic -y🎨 Install Inkscape (SVG/Vector Graphics Processing | Recommended/Required)
- Download and install (Windows 64-bit MSI): Inkscape Download
- Add the Inkscape executable directory to the system environment variable Path (example):
C:\Program Files\Inkscape\bin\
Tip
After configuring the Path, it is recommended to reopen the terminal (or restart VS Code / PowerShell) to ensure the environment variables take effect.
Release page: vllm-windows releases
Recommended version: 0.11.0
pip install vllm-0.11.0+cu124-cp312-cp312-win_amd64.whlImportant
Please make sure the .whl matches your current environment:
- Python: cp312 (Python 3.12)
- Platform: win_amd64
- CUDA: cu124 (must match your local CUDA / driver)
Paper2Any - Paper Workflow Web Frontend (Recommended)
# Start backend API
cd fastapi_app
uvicorn main:app --host 0.0.0.0 --port 8000
# Start frontend (new terminal)
cd frontend-workflow
npm install
npm run devConfigure Frontend Proxy
Modify server.proxy in frontend-workflow/vite.config.ts:
export default defineConfig({
plugins: [react()],
server: {
port: 3000,
open: true,
allowedHosts: true,
proxy: {
'/api': {
target: 'http://127.0.0.1:8000', // FastAPI backend address
changeOrigin: true,
},
},
},
})Visit http://localhost:3000.
Windows: Load MinerU Pre-trained Model
# Start in PowerShell
vllm serve opendatalab/MinerU2.5-2509-1.2B `
--host 127.0.0.1 `
--port 8010 `
--logits-processors mineru_vl_utils:MinerULogitsProcessor `
--gpu-memory-utilization 0.6 `
--trust-remote-code `
--enforce-eager# Start backend API
cd fastapi_app
uvicorn main:app --host 0.0.0.0 --port 8000
# Start frontend (new terminal)
cd frontend-workflow
npm install
npm run devVisit http://localhost:3000.
Paper2Any provides standalone CLI scripts that accept command-line parameters for direct workflow execution without requiring the web frontend/backend.
Configure API access via environment variables (optional):
export DF_API_URL=https://api.openai.com/v1 # LLM API URL
export DF_API_KEY=sk-xxx # API key
export DF_MODEL=gpt-4o # Default model1. Paper2Figure CLI - Generate scientific figures (3 types)
# Generate model architecture diagram from PDF
python script/run_paper2figure_cli.py \
--input paper.pdf \
--graph-type model_arch \
--api-key sk-xxx
# Generate technical roadmap from text
python script/run_paper2figure_cli.py \
--input "Transformer architecture with attention mechanism" \
--input-type TEXT \
--graph-type tech_route
# Generate experimental data visualization
python script/run_paper2figure_cli.py \
--input paper.pdf \
--graph-type exp_dataGraph types: model_arch (model architecture), tech_route (technical roadmap), exp_data (experimental plots)
2. Paper2PPT CLI - Convert papers to PPT presentations
# Basic usage
python script/run_paper2ppt_cli.py \
--input paper.pdf \
--api-key sk-xxx \
--page-count 15
# With custom style
python script/run_paper2ppt_cli.py \
--input paper.pdf \
--style "Academic style; English; Modern design" \
--language en3. PDF2PPT CLI - One-click PDF to editable PPT
# Basic conversion (no AI enhancement)
python script/run_pdf2ppt_cli.py --input slides.pdf
# With AI enhancement
python script/run_pdf2ppt_cli.py \
--input slides.pdf \
--use-ai-edit \
--api-key sk-xxx4. Image2PPT CLI - Convert images to editable PPT
# Basic conversion
python script/run_image2ppt_cli.py --input screenshot.png
# With AI enhancement
python script/run_image2ppt_cli.py \
--input diagram.jpg \
--use-ai-edit \
--api-key sk-xxx5. PPT2Polish CLI - Beautify existing PPT files
# Basic beautification
python script/run_ppt2polish_cli.py \
--input old_presentation.pptx \
--style "Academic style, clean and elegant" \
--api-key sk-xxx
# With reference image for style consistency
python script/run_ppt2polish_cli.py \
--input old_presentation.pptx \
--style "Modern minimalist style" \
--ref-img reference_style.png \
--api-key sk-xxxNote
System Requirements for PPT2Polish:
- LibreOffice:
sudo apt-get install libreoffice(Ubuntu/Debian) - pdf2image:
pip install pdf2image - poppler-utils:
sudo apt-get install poppler-utils
All CLI scripts support these common options:
--api-url URL- LLM API URL (default: fromDF_API_URLenv var)--api-key KEY- API key (default: fromDF_API_KEYenv var)--model NAME- Text model name (default: varies by script)--output-dir DIR- Custom output directory (default:outputs/cli/{script_name}/{timestamp})--help- Show detailed help message
For complete parameter documentation, run any script with --help:
python script/run_paper2figure_cli.py --helpPaper2Any/
├── dataflow_agent/ # Core codebase
│ ├── agentroles/ # Agent definitions
│ │ └── paper2any_agents/ # Paper2Any-specific agents
│ ├── workflow/ # Workflow definitions
│ ├── promptstemplates/ # Prompt templates
│ └── toolkits/ # Toolkits (drawing, PPT generation, etc.)
├── fastapi_app/ # Backend API service
├── frontend-workflow/ # Frontend web interface
├── static/ # Static assets
├── script/ # Script tools
└── tests/ # Test cases
We welcome all forms of contribution!
This project is licensed under Apache License 2.0.












