Paper2Any

English | 中文

✨ Focus on paper multimodal workflows: from paper PDFs/screenshots/text to one-click generation of model diagrams, technical roadmaps, experimental plots, and slide decks ✨

📑 Table of Contents

🔥 News
✨ Core Features
📸 Showcase
🧩 Drawio
🚀 Quick Start
📂 Project Structure
🗺️ Roadmap
🤝 Contributing

🔥 News

Tip

🆕 2026-02-02 · Paper2Rebuttal
Added rebuttal drafting support with structured response guidance and image-aware revision prompts.

Tip

🆕 2026-01-28 · Drawio Update
Added Drawio support for visual diagram creation and showcase-ready outputs in the workflow.
KB updates in one line: multi-file PPT generation with doc convert/merge, optional image injection, and embedding-assisted retrieval.

Tip

🆕 2026-01-25 · New Features
Added AI-assisted outline editing, three-layer model configuration system for flexible model selection, and user points management with daily quota allocation.
🌐 Online Demo: http://dcai-paper2any.nas.cpolar.cn/

Tip

🆕 2026-01-20 · Bug Fixes
Fixed bugs in experimental plot generation (image/text) and resolved the missing historical files issue.
🌐 Online Demo: http://dcai-paper2any.nas.cpolar.cn/

Tip

🆕 2026-01-14 · Feature Updates & Backend Architecture Upgrade

Feature Updates: Added Image2PPT, optimized Paper2Figure interaction, and improved PDF2PPT effects.
Standardized API: Refactored backend interfaces with RESTful /api/v1/ structure, removing obsolete endpoints for better maintainability.
Dynamic Configuration: Supported dynamic model selection (e.g., GPT-4o, Qwen-VL) via API parameters, eliminating hardcoded model dependencies.
🌐 Online Demo: http://dcai-paper2any.nas.cpolar.cn/

2025-12-12 · Paper2Figure Web public beta is live
2025-10-01 · Released the first version 0.1.0

✨ Core Features

From paper PDFs / images / text to editable scientific figures, slide decks, video scripts, academic posters, and other multimodal content in one click.

Paper2Any currently includes the following sub-capabilities:

📊 Paper2Figure - Editable Scientific Figures: Model architecture diagrams, technical roadmaps (PPT + SVG), and experimental plots with editable PPTX output.
🧩 Paper2Diagram / Image2Drawio - Editable Diagrams: Generate draw.io diagrams from paper/text or images, with drawio/png/svg export and chat-based edits.
🎬 Paper2PPT - Editable Slide Decks: Paper/text/topic to PPT, long-doc support, and built-in table/figure extraction.
📝 Paper2Rebuttal: Draft structured rebuttals and revision responses with claims-to-evidence grounding.
🖼️ PDF2PPT - Layout-Preserving Conversion: Accurate layout retention for PDF → editable PPTX.
🖼️ Image2PPT - Image to Slides: Convert images or screenshots into structured slides.
🎨 PPTPolish - Smart Beautification: AI-based layout optimization and style transfer.
🎬 Paper2Video: Generate video scripts and narration assets.
📝 Paper2Technical: Produce technical reports and method summaries.
📚 Knowledge Base (KB): Ingest/embedding, semantic search, and KB-driven PPT/podcast/mindmap generation.

📸 Showcase

🧩 Drawio

_{✨ Diagram generation (mindmap / flowchart / ER ...)}

_{✨ Model diagrams from PDF or text (research figure generation)}

_{✨ Image to editable DrawIO diagram}

📝 Paper2Rebuttal: Rebuttal Drafting

_{✨ Rebuttal drafting and revision support}

📊 Paper2Figure: Scientific Figure Generation

_{✨ Model Architecture Diagram Generation}

_{✨ Technical Roadmap Generation}

_{✨ Experimental Plot Generation (Multiple Styles)}

🎬 Paper2PPT: Paper to Presentation

_{✨ PPT Generation Demo}

_{✨ Paper / Text / Topic → PPT}

_{✨ Long Document Support (40+ Slides)}

_{✨ Intelligent Table Extraction & Insertion}

_{✨ AI-Assisted Outline Editing}

_{✨ Version History Management}

🎨 PPT Smart Beautification

_{✨ AI-based Layout Optimization}

_{✨ AI-based Layout Optimization & Style Transfer}

🖼️ PDF2PPT: Layout-Preserving Conversion

_{✨ Intelligent Cutout & Layout Preservation}

_{✨ Image2PPT}

🚀 Quick Start

Requirements

🐳 Docker (Recommended) — Deployment & Updates

# 1. Clone
git clone https://github.com/OpenDCAI/Paper2Any.git
cd Paper2Any

# 2. Backend env (required for your API keys/models)
cp fastapi_app/.env.example fastapi_app/.env

# 3. Build + run
docker compose up -d --build

Open:

Frontend: http://localhost:3000
Backend health: http://localhost:8000/health

Modify & update:

After changing code or .env, rebuild: docker compose up -d --build
Pull latest code and rebuild:
- git pull
- docker compose up -d --build

Common commands:

View logs: docker compose logs -f
Stop services: docker compose down

Notes:

The first build may take a while (system deps + Python deps).
Frontend env is baked at build time (compose build args). If you change it, rebuild with docker compose up -d --build.
Outputs/models are mounted to the host (./outputs, ./models) for persistence.

🐧 Linux Installation

We recommend using Conda to create an isolated environment (Python 3.11).

1. Create Environment & Install Base Dependencies

# 0. Create and activate a conda environment
conda create -n paper2any python=3.11 -y
conda activate paper2any

# 1. Clone repository
git clone https://github.com/OpenDCAI/Paper2Any.git
cd Paper2Any

# 2. Install base dependencies
pip install -r requirements-base.txt

# 3. Install in editable (dev) mode
pip install -e .

2. Install Paper2Any-specific Dependencies (Required)

Paper2Any involves LaTeX rendering, vector graphics processing as well as PPT/PDF conversion, which require extra dependencies:

# 1. Python dependencies
pip install -r requirements-paper.txt || pip install -r requirements-paper-backup.txt

# 2. LaTeX engine (tectonic) - recommended via conda
conda install -c conda-forge tectonic -y

# 3. Resolve doclayout_yolo dependency conflicts (Important)
pip install doclayout_yolo --no-deps

# 4. System dependencies (Ubuntu example)
sudo apt-get update
sudo apt-get install -y inkscape libreoffice poppler-utils wkhtmltopdf

3. Environment Variables

export DF_API_KEY=your_api_key_here
export DF_API_URL=xxx  # Optional: if you need a third-party API gateway
export MINERU_DEVICES="0,1,2,3" # Optional: MinerU task GPU resource pool

Tip

📚 For detailed configuration guide, see Configuration Guide for step-by-step instructions on configuring models, environment variables, and starting services.

4. Configure Environment Files (Optional)

📝 Click to expand: Detailed .env Configuration Guide

Paper2Any uses two .env files for configuration. Both are optional - you can run the application without them using default settings.

Step 1: Copy Example Files

# Copy backend environment file
cp fastapi_app/.env.example fastapi_app/.env

# Copy frontend environment file
cp frontend-workflow/.env.example frontend-workflow/.env

Step 2: Backend Configuration (`fastapi_app/.env`)

Supabase (Optional) - Only needed if you want user authentication and cloud storage:

SUPABASE_URL=https://your-project-id.supabase.co
SUPABASE_ANON_KEY=your_supabase_anon_key

Model Configuration - Customize which models to use for different workflows:

# Default LLM API URL
DEFAULT_LLM_API_URL=http://123.129.219.111:3000/v1/

# Workflow-level defaults
PAPER2PPT_DEFAULT_MODEL=gpt-5.1
PAPER2PPT_DEFAULT_IMAGE_MODEL=gemini-3-pro-image-preview
PDF2PPT_DEFAULT_MODEL=gpt-4o
# ... see .env.example for full list

Step 3: Frontend Configuration (`frontend-workflow/.env`)

LLM Provider Configuration - Controls the API endpoint dropdown in the UI:

# Default API URL shown in the UI
VITE_DEFAULT_LLM_API_URL=https://api.apiyi.com/v1

# Available API URLs in the dropdown (comma-separated)
VITE_LLM_API_URLS=https://api.apiyi.com/v1,http://b.apiyi.com:16888/v1,http://123.129.219.111:3000/v1

What happens when you modify VITE_LLM_API_URLS:

The frontend will display a dropdown menu with all URLs you specify
Users can select different API endpoints without manually typing URLs
Useful for switching between OpenAI, local models, or custom API gateways

Supabase (Optional) - Uncomment these lines if you want user authentication:

VITE_SUPABASE_URL=https://your-project.supabase.co
VITE_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
SUPABASE_JWT_SECRET=your-jwt-secret

Running Without Supabase

If you skip Supabase configuration:

✅ All core features work normally
✅ CLI scripts work without any configuration
❌ No user authentication or quotas
❌ No cloud file storage

Note

Quick Start: You can skip the .env configuration entirely and use CLI scripts directly with --api-key parameter. See CLI Scripts section below.

Advanced Configuration: Local Model Service Load Balancing

If you are deploying in a high-concurrency local environment, you can use script/start_model_servers.sh to start a local model service cluster (MinerU / SAM / OCR).

Script location: /DataFlow-Agent/script/start_model_servers.sh

Main configuration items:

MinerU (PDF Parsing)
- MINERU_MODEL_PATH: Model path (default models/MinerU2.5-2509-1.2B)
- MINERU_GPU_UTIL: GPU memory utilization (default 0.2)
- Instance configuration: By default, 4 instances are started on GPU 0 and GPU 4 respectively (8 in total), ports 8011-8018.
- Load Balancer: Port 8010, automatically dispatches requests.
SAM (Segment Anything Model)
- Instance configuration: By default, 1 instance is started on GPU 2 and GPU 3 respectively, ports 8021-8022.
- Load Balancer: Port 8020.
OCR (PaddleOCR)
- Config: Runs on CPU, uses uvicorn's worker mechanism (4 workers by default).
- Port: 8003.

Before using, please modify gpu_id and the number of instances in the script according to your actual GPU count and memory.

🪟 Windows Installation

Note

We currently recommend trying Paper2Any on Linux / WSL. If you need to deploy on native Windows, please follow the steps below.

1. Create Environment & Install Base Dependencies

# 0. Create and activate a conda environment
conda create -n paper2any python=3.12 -y
conda activate paper2any

# 1. Clone repository
git clone https://github.com/OpenDCAI/Paper2Any.git
cd Paper2Any

# 2. Install base dependencies
pip install -r requirements-win-base.txt

# 3. Install in editable (dev) mode
pip install -e .

2. Install Paper2Any-specific Dependencies (Recommended)

Paper2Any involves LaTeX rendering and vector graphics processing, which require extra dependencies (see requirements-paper.txt):

# Python dependencies
pip install -r requirements-paper.txt

# tectonic: LaTeX engine (recommended via conda)
conda install -c conda-forge tectonic -y

🎨 Install Inkscape (SVG/Vector Graphics Processing | Recommended/Required)

Download and install (Windows 64-bit MSI): Inkscape Download
Add the Inkscape executable directory to the system environment variable Path (example): C:\Program Files\Inkscape\bin\

Tip

After configuring the Path, it is recommended to reopen the terminal (or restart VS Code / PowerShell) to ensure the environment variables take effect.

⚡ Install Windows Build of vLLM (Optional | For Local Inference Acceleration)

Release page: vllm-windows releases
Recommended version: 0.11.0

pip install vllm-0.11.0+cu124-cp312-cp312-win_amd64.whl

Important

Please make sure the .whl matches your current environment:

Python: cp312 (Python 3.12)
Platform: win_amd64
CUDA: cu124 (must match your local CUDA / driver)

Launch Application

Paper2Any - Paper Workflow Web Frontend (Recommended)

# Start backend API
cd fastapi_app
uvicorn main:app --host 0.0.0.0 --port 8000

# Start frontend (new terminal)
cd frontend-workflow
npm install
npm run dev

Configure Frontend Proxy

Modify server.proxy in frontend-workflow/vite.config.ts:

export default defineConfig({
  plugins: [react()],
  server: {
    port: 3000,
    open: true,
    allowedHosts: true,
    proxy: {
      '/api': {
        target: 'http://127.0.0.1:8000',  // FastAPI backend address
        changeOrigin: true,
      },
    },
  },
})

Visit http://localhost:3000.

Windows: Load MinerU Pre-trained Model

# Start in PowerShell
vllm serve opendatalab/MinerU2.5-2509-1.2B `
  --host 127.0.0.1 `
  --port 8010 `
  --logits-processors mineru_vl_utils:MinerULogitsProcessor `
  --gpu-memory-utilization 0.6 `
  --trust-remote-code `
  --enforce-eager

Launch Application

🎨 Web Frontend (Recommended)

# Start backend API
cd fastapi_app
uvicorn main:app --host 0.0.0.0 --port 8000

# Start frontend (new terminal)
cd frontend-workflow
npm install
npm run dev

Visit http://localhost:3000.

🖥️ CLI Scripts (Command-Line Interface)

Paper2Any provides standalone CLI scripts that accept command-line parameters for direct workflow execution without requiring the web frontend/backend.

Environment Variables

Configure API access via environment variables (optional):

export DF_API_URL=https://api.openai.com/v1  # LLM API URL
export DF_API_KEY=sk-xxx                      # API key
export DF_MODEL=gpt-4o                        # Default model

Available CLI Scripts

1. Paper2Figure CLI - Generate scientific figures (3 types)

# Generate model architecture diagram from PDF
python script/run_paper2figure_cli.py \
  --input paper.pdf \
  --graph-type model_arch \
  --api-key sk-xxx

# Generate technical roadmap from text
python script/run_paper2figure_cli.py \
  --input "Transformer architecture with attention mechanism" \
  --input-type TEXT \
  --graph-type tech_route

# Generate experimental data visualization
python script/run_paper2figure_cli.py \
  --input paper.pdf \
  --graph-type exp_data

Graph types: model_arch (model architecture), tech_route (technical roadmap), exp_data (experimental plots)

2. Paper2PPT CLI - Convert papers to PPT presentations

# Basic usage
python script/run_paper2ppt_cli.py \
  --input paper.pdf \
  --api-key sk-xxx \
  --page-count 15

# With custom style
python script/run_paper2ppt_cli.py \
  --input paper.pdf \
  --style "Academic style; English; Modern design" \
  --language en

3. PDF2PPT CLI - One-click PDF to editable PPT

# Basic conversion (no AI enhancement)
python script/run_pdf2ppt_cli.py --input slides.pdf

# With AI enhancement
python script/run_pdf2ppt_cli.py \
  --input slides.pdf \
  --use-ai-edit \
  --api-key sk-xxx

4. Image2PPT CLI - Convert images to editable PPT

# Basic conversion
python script/run_image2ppt_cli.py --input screenshot.png

# With AI enhancement
python script/run_image2ppt_cli.py \
  --input diagram.jpg \
  --use-ai-edit \
  --api-key sk-xxx

5. PPT2Polish CLI - Beautify existing PPT files

# Basic beautification
python script/run_ppt2polish_cli.py \
  --input old_presentation.pptx \
  --style "Academic style, clean and elegant" \
  --api-key sk-xxx

# With reference image for style consistency
python script/run_ppt2polish_cli.py \
  --input old_presentation.pptx \
  --style "Modern minimalist style" \
  --ref-img reference_style.png \
  --api-key sk-xxx

Note

System Requirements for PPT2Polish:

LibreOffice: sudo apt-get install libreoffice (Ubuntu/Debian)
pdf2image: pip install pdf2image
poppler-utils: sudo apt-get install poppler-utils

Common Options

All CLI scripts support these common options:

--api-url URL - LLM API URL (default: from DF_API_URL env var)
--api-key KEY - API key (default: from DF_API_KEY env var)
--model NAME - Text model name (default: varies by script)
--output-dir DIR - Custom output directory (default: outputs/cli/{script_name}/{timestamp})
--help - Show detailed help message

For complete parameter documentation, run any script with --help:

python script/run_paper2figure_cli.py --help

📂 Project Structure

Paper2Any/
├── dataflow_agent/          # Core codebase
│   ├── agentroles/         # Agent definitions
│   │   └── paper2any_agents/ # Paper2Any-specific agents
│   ├── workflow/           # Workflow definitions
│   ├── promptstemplates/   # Prompt templates
│   └── toolkits/           # Toolkits (drawing, PPT generation, etc.)
├── fastapi_app/            # Backend API service
├── frontend-workflow/      # Frontend web interface
├── static/                 # Static assets
├── script/                 # Script tools
└── tests/                  # Test cases

🗺️ Roadmap

Feature	Status	Sub-features
📊 Paper2Figure _{Editable Scientific Figures}
🧩 Paper2Diagram _{Drawio Diagrams}
🎬 Paper2PPT _{Editable Slide Decks}
🖼️ PDF2PPT _{Layout-Preserving Conversion}
🖼️ Image2PPT _{Image to Slides}
🎨 PPTPolish _{Smart Beautification}
📚 Knowledge Base _{KB Workflows}
🎬 Paper2Video _{Video Script Generation}

🤝 Contributing

We welcome all forms of contribution!

📄 License

This project is licensed under Apache License 2.0.

If this project helps you, please give us a ⭐️ Star!

_{Scan to join the community WeChat group}

❤️ Made with by OpenDCAI Team

Name		Name	Last commit message	Last commit date
Latest commit History 627 Commits
database		database
dataflow_agent		dataflow_agent
deploy		deploy
docs		docs
fastapi_app		fastapi_app
frontend-workflow		frontend-workflow
script		script
static		static
supabase/functions		supabase/functions
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements-base.txt		requirements-base.txt
requirements-paper-backup.txt		requirements-paper-backup.txt
requirements-paper.txt		requirements-paper.txt
requirements-win-base.txt		requirements-win-base.txt

License

OpenDCAI/Paper2Any

Folders and files

Latest commit

History

Repository files navigation

Paper2Any

📑 Table of Contents

🔥 News

✨ Core Features

📸 Showcase

🧩 Drawio

📝 Paper2Rebuttal: Rebuttal Drafting

📊 Paper2Figure: Scientific Figure Generation

🎬 Paper2PPT: Paper to Presentation

🎨 PPT Smart Beautification

🖼️ PDF2PPT: Layout-Preserving Conversion

🚀 Quick Start

Requirements

🐧 Linux Installation

1. Create Environment & Install Base Dependencies

2. Install Paper2Any-specific Dependencies (Required)

3. Environment Variables

4. Configure Environment Files (Optional)

Step 1: Copy Example Files

Step 2: Backend Configuration (fastapi_app/.env)

Step 3: Frontend Configuration (frontend-workflow/.env)

Running Without Supabase

🪟 Windows Installation

1. Create Environment & Install Base Dependencies

2. Install Paper2Any-specific Dependencies (Recommended)

⚡ Install Windows Build of vLLM (Optional | For Local Inference Acceleration)

Launch Application

Launch Application

🎨 Web Frontend (Recommended)

🖥️ CLI Scripts (Command-Line Interface)

Environment Variables

Available CLI Scripts

Common Options

📂 Project Structure

🗺️ Roadmap

🤝 Contributing

📄 License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 11

Uh oh!

Languages

Step 2: Backend Configuration (`fastapi_app/.env`)

Step 3: Frontend Configuration (`frontend-workflow/.env`)

Packages