Skip to content

HuynhNguyenPhuc/Model-Conversion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Model Conversion Package

Python Version License: MIT

Convert PyTorch models to optimized formats (ONNX, TensorRT, TorchScript) for high-performance inference.

Built on: NVIDIA Model Navigator | NVIDIA ModelOpt | Hydra

🚀 What It Does

  1. Convert your PyTorch model to ONNX, TensorRT, or TorchScript
  2. Profile each format to find the fastest configuration
  3. Validate that converted models match the original output
  4. Export to Triton Inference Server (optional)

📋 Requirements

⚙️ Environment Setup

Before using this package, you must properly configure your CUDA and cuDNN environment. This package does not modify environment variables at runtime.

Linux

  1. Install cuDNN via your package manager or from the NVIDIA cuDNN Archive:

    # Ubuntu/Debian (recommended)
    sudo apt-get install libcudnn9-cuda-12
    
    # Or download from NVIDIA and install manually
  2. Set environment variables in your shell profile (~/.bashrc or ~/.zshrc):

    # CUDA
    export CUDA_HOME=/usr/local/cuda
    export PATH=$CUDA_HOME/bin:$PATH
    export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
    
    # cuDNN (if installed to a custom location)
    export LD_LIBRARY_PATH=/path/to/cudnn/lib:$LD_LIBRARY_PATH
  3. Apply changes:

    source ~/.bashrc

Windows

  1. Install cuDNN from the NVIDIA cuDNN Archive:

    • Download the ZIP archive for your CUDA version
    • Extract to a location (e.g., C:\Program Files\NVIDIA\cudnn)
  2. Set environment variables via System Properties → Environment Variables:

    CUDA_PATH = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4
    
    # Add to PATH:
    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin
    C:\Program Files\NVIDIA\cudnn\bin
    

    Or via PowerShell (run as Administrator):

    [Environment]::SetEnvironmentVariable("CUDA_PATH", "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4", "Machine")
    $path = [Environment]::GetEnvironmentVariable("PATH", "Machine")
    $cudaPaths = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin;C:\Program Files\NVIDIA\cudnn\bin"
    [Environment]::SetEnvironmentVariable("PATH", "$cudaPaths;$path", "Machine")
  3. Restart your terminal for changes to take effect.

Verify Installation

# Check CUDA
nvcc --version

# Check cuDNN (Linux)
ldconfig -p | grep cudnn

# Check cuDNN (Windows PowerShell)
where.exe cudnn*.dll

⚙️ Installation

git clone <your-repo-url>
cd <your-repo-directory>
pip install -e .

🛠️ Quick Start

Minimal Example

import torch
import torch.nn as nn
from model_conversion import convert_model

# Your model
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(256, 10)
    
    def forward(self, x):
        return self.fc(x)

model = MyModel().eval().cuda()

# Convert to optimized formats
package_path = convert_model(
    model=model,
    dataloader=None,  # Uses dummy data
    config_path="configs/my_model.yaml"
)

print(f"Saved to: {package_path}")

With Triton Export

from model_conversion import convert_model, export_to_triton

# Step 1: Convert
package_path = convert_model(
    model=model,
    config_path="configs/my_model.yaml"
)

# Step 2: Export to Triton
export_to_triton(
    package_path=package_path,
    repository_path="triton_models",
    config_path="configs/my_model.yaml"
)

📁 Configuration

The package uses Hydra configuration. Create a YAML file for your model:

Basic Config (configs/my_model.yaml)

model:
  name: "my_classifier"
  device: "cuda"
  
  io_spec:
    inputs:
      - name: "input_0"
        shape: [-1, 3, 224, 224]    # -1 = dynamic batch
        dtype: "float32"
        dynamic_axes: {0: "batch_size"}
        
    outputs:
      - name: "output_0"
        shape: [-1, 1000]
        dtype: "float32"
        dynamic_axes: {0: "batch_size"}

    batch_size: 8
    num_samples: 100

That's it! Everything else uses sensible defaults.

Config Priority

  1. Package defaultssrc/model_conversion/cfg/config.yaml
  2. Your config file → overrides defaults
  3. Code overrides → highest priority
# Override at runtime
convert_model(
    model=model,
    config_path="configs/my_model.yaml",
    config_overrides=["model.name=new_name", "profiler.batch_sizes=[1,4,8]"]
)

🎯 Target Formats

Format Description Use Case
onnx Open Neural Network Exchange Portable, works anywhere
trt TensorRT (FP16 by default) Maximum NVIDIA GPU performance
torchscript PyTorch JIT compiled PyTorch ecosystem only

Default: ONNX + TensorRT FP16

🔧 Optional Features

Real Calibration Data

For better profiling accuracy, provide a real dataloader:

from torch.utils.data import DataLoader

dataloader = DataLoader(
    your_dataset,
    batch_size=8,
    num_workers=4
)

convert_model(
    model=model,
    dataloader=dataloader,  # Real data instead of dummy
    config_path="configs/my_model.yaml"
)

INT8 Quantization

Add an optimization section to your config:

# configs/my_model_int8.yaml

model:
  name: "my_classifier_int8"
  io_spec:
    # ... same as above

optimization:
  strategy:
    _target_: model_conversion.optimization.QuantizationStrategy
    quant_cfg_name: "INT8_DEFAULT_CFG"  # ModelOpt quantization config
  
  # Calibration data (required for INT8)
  calibration_dataloader:
    _target_: torch.utils.data.DataLoader
    batch_size: 8
    num_workers: 4
    dataset:
      _target_: torchvision.datasets.FakeData
      size: 100
      image_size: [3, 224, 224]
      transform:
        _target_: torchvision.transforms.ToTensor

INT8 quantization uses NVIDIA ModelOpt for PTQ calibration.

NVIDIA DALI Integration

For GPU-accelerated data loading, use DALI instead of a PyTorch DataLoader:

optimization:
  strategy:
    _target_: model_conversion.optimization.QuantizationStrategy
  
  # DALI pipeline (GPU-accelerated)
  dali_pipeline:
    _target_: model_conversion.data.dali.CalibrationPipeline
    batch_size: 8
    image_root: "/path/to/images"
    file_list: "/path/to/image_list.txt"

📊 Output Structure

output_workspace/
├── my_classifier.nav           # Model Navigator package
└── navigator_workspace/
    ├── model.onnx              # ONNX export
    ├── model.plan              # TensorRT engine
    └── status.yaml             # Conversion results

triton_models/                  # After export_to_triton()
└── my_classifier/
    ├── config.pbtxt            # Triton config
    └── 1/
        └── model.plan          # Best performing model

🐛 Troubleshooting

cuDNN Library Not Found

Ensure cuDNN is installed and your environment variables are correctly set. See Environment Setup.

Common error messages:

  • Could not load library libcudnn_... (Linux)
  • cudnn64_*.dll not found (Windows)

Memory Issues

Reduce batch size in your config:

model:
  io_spec:
    batch_size: 4  # Lower this

Conversion Failures

Enable debug mode:

debug: true
verbose: true

📚 Full Example

See examples/dinov2.py for a complete DINOv2 conversion workflow.

# Run the example
python examples/dinov2.py

License

MIT License - see LICENSE.md

About

A simple Python toolkit for converting, optimizing, and verifying deep learning models for large-scale deployment

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages