A powerful OCR node for ComfyUI that integrates the DeepSeek-OCR model from Hugging Face.
Note
ComfyUI-DeepSeek-OCR is currently in V0.0.1 beta status. Please stay tuned for future releases as we continue to refine and expand the functionality of this node.
- ✅ Models stored directly in
ComfyUI/models/LLM/DeepSeek-OCR/(no nested folders) - ✅ Uses
hf_hub_downloadfor clean, direct file downloads - ✅ Compatible with transformers 4.46+ (with optional patching for 4.50+)
- ✅ Multiple OCR task types (Free OCR, Markdown, Figure parsing)
- ✅ Multiple resolution presets for speed/quality tradeoff
- ✅ Optional detection box visualization
cd ComfyUI/custom_nodes
git clone https://github.com/1038lab/ComfyUI-DeepSeek-OCR.git
cd deepseek_ocrpip install -r requirements.txtpip install flash-attn --no-build-isolationIf flash-attn fails, the node will work with eager attention (slower but functional).
If you have transformers 4.50 or newer (like 4.57):
python patch_model_code.pyThis patches the model's Python code to work with newer transformers versions.
On first use, the node will automatically download model files (~6.6GB) to:
ComfyUI/models/LLM/DeepSeek-OCR/
├── config.json
├── tokenizer.json
├── model-00001-of-00012.safetensors
... (12 safetensors files total)
No nested folders! All files go directly into DeepSeek-OCR/.
- image (IMAGE): Input image for OCR
- task_type (COMBO): OCR task to perform
- Free OCR - Simple text extraction (default)
- Convert to Markdown - Convert document to markdown format
- Parse Figure - Extract text from charts/figures
- resolution_preset (COMBO): Quality/speed tradeoff
- Tiny (512x512) - Fastest
- Small (640x640) - Fast
- Base (1024x1024) - Balanced (default)
- Large (1280x1280) - High quality
- Gundam (1024x640) - Optimized for documents
- draw_boxes (COMBO): Visualization
- disable - No boxes (default)
- enable - Draw detection boxes
- eval_mode (COMBO): Performance mode
- disable - Full output (default)
- enable - Faster, text-only
- text (STRING): Recognized text or markdown
- image_output (IMAGE): Original or annotated image
Basic text extraction. Best for:
- Simple documents
- Receipts
- Forms
- Signs
Converts structured documents to markdown. Preserves:
- Headings
- Tables
- Lists
- Text formatting
Best for: academic papers, reports, structured documents.
Specialized for extracting text from:
- Charts and graphs
- Diagrams
- Scientific figures
- Data visualizations
| Preset | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
| Tiny | 512 | ⚡⚡⚡ | ⭐⭐ | Quick tests |
| Small | 640 | ⚡⚡ | ⭐⭐⭐ | Simple docs |
| Base | 1024 | ⚡ | ⭐⭐⭐⭐ | Most cases |
| Large | 1280 | 🐢 | ⭐⭐⭐⭐⭐ | Complex docs |
| Gundam | 1024+crop | ⚡⚡ | ⭐⭐⭐⭐ | Documents |
This happens with transformers 4.50+.
Solution:
cd ComfyUI/custom_nodes/deepseek_ocr
python patch_model_code.py
# Restart ComfyUIThe patch makes the model code compatible with any transformers version.
- Try to use the node in ComfyUI once (it will fail but download the code)
- Run
python patch_model_code.pyagain - Restart ComfyUI
All model files are stored directly in:
ComfyUI/models/LLM/DeepSeek-OCR/
No nested snapshots/ or models--deepseek-ai--DeepSeek-OCR/ folders!
The node uses hf_hub_download with local_dir and local_dir_use_symlinks=False for clean, direct downloads.
If download is interrupted:
- Delete the
ComfyUI/models/LLM/DeepSeek-OCR/folder - Restart ComfyUI
- The node will re-download all files
Use smaller presets: Tiny or Small.
- Enable eval_mode for faster processing
- Use GPU if available
- Use smaller resolution presets
- Install flash-attn
Detection boxes only appear when:
- Model generates
<|det|>tags in output - Task supports grounding (Markdown conversion)
This node follows the same download pattern as ComfyUI-RMBG:
- Uses
hf_hub_downloadfor individual file downloads - Stores files directly in model folder (no nested structure)
- Uses
local_dirparameter for clean organization - Uses
local_files_only=Trueafter download
This makes the model folder clean and easy to manage.
- Python 3.10+
- CUDA 11.8+ (for GPU)
- torch >= 2.0.0
- transformers >= 4.46.0
- huggingface_hub
- flash-attn (optional, for speed)
- Model: https://huggingface.co/deepseek-ai/DeepSeek-OCR
- GitHub: https://github.com/deepseek-ai/DeepSeek-OCR
- This Node: https://github.com/1038lab/ComfyUI-DeepSeek-OCR
MIT License - follows DeepSeek-OCR model license.