Modern, type-safe implementation of PDFNet (Patch-Depth Fusion Network) for high-precision dichotomous image segmentation. This fork features a complete rewrite with Python 3.12 type hints, unified CLI using tyro, and streamlined codebase architecture.
Original Paper: Patch-Depth Fusion: Dichotomous Image Segmentation via Fine-Grained Patch Strategy and Depth Integrity-Prior Original Authors: Xianjie Liu, Keren Fu, Qijun Zhao Original Repository: Tennine2077/PDFNet
✅ Type-Safe Implementation - Full Python 3.12 type hints throughout
✅ Unified CLI - Single pdfnet command with subcommands (train, infer, test, evaluate)
✅ Modern Architecture - Clean, maintainable codebase with 40% less code
✅ MoGe Integration - Uses Microsoft's MoGe for superior depth estimation
✅ Easy Installation - Install directly from GitHub with uv or pip
✅ TTA Support - Test-time augmentation for improved accuracy
# Install with uv (recommended - faster)
uv add git+https://github.com/OpsiClear/PDFNet_Moge.git
# Or with pip
pip install git+https://github.com/OpsiClear/PDFNet_Moge.git
# Development installation
git clone https://github.com/OpsiClear/PDFNet_Moge.git
cd PDFNet_Moge
uv pip install -e .# Download PDFNet and Swin-B weights
pdfnet download --weights
# Show dataset download instructions
pdfnet download --dataset-info# Run inference on a single image
pdfnet infer --input image.jpg --output result.png
# Run inference on a directory with TTA
pdfnet infer --input images/ --output results/ --use-tta
# Visualize results
pdfnet infer --input image.jpg --visualize
# Train a model
pdfnet train --config-file config.yaml
# Benchmark on DIS datasets
pdfnet benchmark --checkpoint checkpoints/PDFNet_Best.pth --data-path DATA/DIS-DATA
# Evaluate predictions
pdfnet evaluate --pred-dir results/ --gt-dir DATA/DIS-DATAThe unified pdfnet CLI provides all functionality:
| Command | Description |
|---|---|
pdfnet train |
Train PDFNet models with custom configurations |
pdfnet infer |
Run inference on images (single/batch, with optional TTA) |
pdfnet benchmark |
Benchmark model on standard DIS datasets with metrics |
pdfnet evaluate |
Evaluate predictions against ground truth |
pdfnet config |
Configuration management (show, create, validate) |
pdfnet download |
Download model weights and get dataset info |
Get help for any command:
pdfnet --help
pdfnet infer --helpUse PDFNet in your Python scripts:
from pdfnet.inference import PDFNetInference
from pdfnet.config import PDFNetConfig
# Create configuration
config = PDFNetConfig()
config.inference.checkpoint_path = "checkpoints/PDFNet_Best.pth"
config.inference.use_tta = True
# Initialize inference engine
engine = PDFNetInference(config)
# Run inference
result = engine.predict("image.jpg")
# Process directory
engine.predict_directory("input_dir/", "output_dir/")PDFNet_Moge/
├── pdfnet.py # CLI entry point (local dev)
├── src/pdfnet/
│ ├── __main__.py # Package CLI entry point
│ ├── config.py # Type-safe configuration
│ ├── inference.py # Inference engine with TTA
│ ├── train.py # Training loop
│ ├── models/
│ │ ├── PDFNet.py # Model architecture
│ │ └── swin_transformer.py
│ ├── data/
│ │ └── transforms.py # Type-safe data transforms
│ ├── core/
│ │ └── losses.py # Type-safe loss functions
│ ├── dataloaders/
│ │ └── dis_dataset.py # DIS dataset loading
│ └── metric_tools/ # Evaluation utilities
├── demo.ipynb # Interactive demo
└── CLAUDE.md # Development guide
Download the DIS-5K dataset and organize as:
PDFNet_Moge/
└── DATA/
└── DIS-DATA/
├── DIS-TR/ # Training set
├── DIS-VD/ # Validation set
├── DIS-TE1/ # Test set 1
├── DIS-TE2/ # Test set 2
├── DIS-TE3/ # Test set 3
└── DIS-TE4/ # Test set 4
├── images/
└── masks/
| Model | Download |
|---|---|
| PDFNet (DIS-5K) | Google Drive |
| Swin-B Backbone | GitHub Release |
Place weights in checkpoints/ directory.
# Train with default configuration
pdfnet train
# Train with custom config
pdfnet train --config-file my_config.yaml
# Resume training
pdfnet train --resume checkpoints/last.pthTraining configuration can be managed via YAML files or the type-safe PDFNetConfig dataclass.
PDFNet achieves state-of-the-art results on DIS-5K dataset:
- Memory Efficient: 1024×1024 inference uses ~4.9GB VRAM
- Fast Training: ~2 days on RTX 4090 for DIS-5K
- Small Model: <11% parameters of diffusion-based methods
- High Accuracy: Matches or exceeds diffusion methods
For detailed benchmarks, see the original paper.
- ✅ Type-Safe Codebase - Python 3.12 type hints using
tyrofor CLI - ✅ Unified CLI - Single entry point replacing 3 separate scripts
- ✅ 40% Code Reduction - Removed duplicates and dead code
- ✅ Clean Structure - Organized modules (core/, data/, models/)
- ✅ Better Imports - Proper package structure for pip installation
- ❌ Old argparse CLI (
pdfnet_cli.py) - ❌ Standalone inference script (
apply_pdfnet.py) - ❌ Redundant demo script (
demo.py) - ❌ Unused constants file
- ❌ Duplicate utility functions
- 🎯 Type-safe configuration with dataclasses
- 🎯 Modular inference engine
- 🎯 Test-time augmentation support
- 🎯 Batch processing for directories
- 🎯 Pip installable from GitHub
- 🎯
python -m pdfnetexecution support
# Show default configuration
pdfnet config --action show
# Create custom config file
pdfnet config --action create --output my_config.yaml
# Validate configuration
pdfnet config --action validate --config-file my_config.yaml- Python 3.12+
- PyTorch 2.0+ with CUDA support (recommended)
- 8GB+ GPU VRAM for inference
- 24GB+ GPU VRAM for training (RTX 4090 or equivalent)
All dependencies are automatically installed via uv add or pip install.
Try PDFNet with the Jupyter notebook:
jupyter notebook demo.ipynbThe demo includes:
- Single image inference
- Batch processing
- Test-time augmentation examples
- Visualization tools
If you use PDFNet in your research, please cite the original paper:
@misc{liu2025patchdepthfusiondichotomousimage,
title={Patch-Depth Fusion: Dichotomous Image Segmentation via Fine-Grained Patch Strategy and Depth Integrity-Prior},
author={Xianjie Liu and Keren Fu and Qijun Zhao},
year={2025},
eprint={2503.06100},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.06100}
}This project is licensed under the MIT License - see the LICENSE file for details.
- Original PDFNet paper and implementation by Xianjie Liu, Keren Fu, and Qijun Zhao
- MoGe by Microsoft for depth estimation
- DIS-5K dataset by Xuebin Qin et al.
- Swin Transformer by Microsoft
- 📖 Original Paper
- 🤗 Hugging Face Space
- 📚 Awesome Dichotomous Image Segmentation
- 🔧 Development Guide
- 📦 Installation Guide
Contributions are welcome! This fork focuses on:
- Type safety and code quality
- CLI/API improvements
- Documentation
- Bug fixes
Please open an issue or pull request on GitHub.
- Issues: GitHub Issues
- Original Repository: Tennine2077/PDFNet
Note: This is a modernized fork focusing on code quality and developer experience. For the original implementation, please visit Tennine2077/PDFNet.