Convert PyTorch models to optimized formats (ONNX, TensorRT, TorchScript) for high-performance inference.
Built on: NVIDIA Model Navigator | NVIDIA ModelOpt | Hydra
- Convert your PyTorch model to ONNX, TensorRT, or TorchScript
- Profile each format to find the fastest configuration
- Validate that converted models match the original output
- Export to Triton Inference Server (optional)
- Python 3.9+
- CUDA 12.4
- cuDNN 9.x (see Environment Setup)
- NVIDIA GPU (for TensorRT)
Before using this package, you must properly configure your CUDA and cuDNN environment. This package does not modify environment variables at runtime.
-
Install cuDNN via your package manager or from the NVIDIA cuDNN Archive:
# Ubuntu/Debian (recommended) sudo apt-get install libcudnn9-cuda-12 # Or download from NVIDIA and install manually
-
Set environment variables in your shell profile (
~/.bashrcor~/.zshrc):# CUDA export CUDA_HOME=/usr/local/cuda export PATH=$CUDA_HOME/bin:$PATH export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH # cuDNN (if installed to a custom location) export LD_LIBRARY_PATH=/path/to/cudnn/lib:$LD_LIBRARY_PATH
-
Apply changes:
source ~/.bashrc
-
Install cuDNN from the NVIDIA cuDNN Archive:
- Download the ZIP archive for your CUDA version
- Extract to a location (e.g.,
C:\Program Files\NVIDIA\cudnn)
-
Set environment variables via System Properties → Environment Variables:
CUDA_PATH = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4 # Add to PATH: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin C:\Program Files\NVIDIA\cudnn\binOr via PowerShell (run as Administrator):
[Environment]::SetEnvironmentVariable("CUDA_PATH", "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4", "Machine") $path = [Environment]::GetEnvironmentVariable("PATH", "Machine") $cudaPaths = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin;C:\Program Files\NVIDIA\cudnn\bin" [Environment]::SetEnvironmentVariable("PATH", "$cudaPaths;$path", "Machine")
-
Restart your terminal for changes to take effect.
# Check CUDA
nvcc --version
# Check cuDNN (Linux)
ldconfig -p | grep cudnn
# Check cuDNN (Windows PowerShell)
where.exe cudnn*.dllgit clone <your-repo-url>
cd <your-repo-directory>
pip install -e .import torch
import torch.nn as nn
from model_conversion import convert_model
# Your model
class MyModel(nn.Module):
def __init__(self):
super().__init__()
self.fc = nn.Linear(256, 10)
def forward(self, x):
return self.fc(x)
model = MyModel().eval().cuda()
# Convert to optimized formats
package_path = convert_model(
model=model,
dataloader=None, # Uses dummy data
config_path="configs/my_model.yaml"
)
print(f"Saved to: {package_path}")from model_conversion import convert_model, export_to_triton
# Step 1: Convert
package_path = convert_model(
model=model,
config_path="configs/my_model.yaml"
)
# Step 2: Export to Triton
export_to_triton(
package_path=package_path,
repository_path="triton_models",
config_path="configs/my_model.yaml"
)The package uses Hydra configuration. Create a YAML file for your model:
model:
name: "my_classifier"
device: "cuda"
io_spec:
inputs:
- name: "input_0"
shape: [-1, 3, 224, 224] # -1 = dynamic batch
dtype: "float32"
dynamic_axes: {0: "batch_size"}
outputs:
- name: "output_0"
shape: [-1, 1000]
dtype: "float32"
dynamic_axes: {0: "batch_size"}
batch_size: 8
num_samples: 100That's it! Everything else uses sensible defaults.
- Package defaults →
src/model_conversion/cfg/config.yaml - Your config file → overrides defaults
- Code overrides → highest priority
# Override at runtime
convert_model(
model=model,
config_path="configs/my_model.yaml",
config_overrides=["model.name=new_name", "profiler.batch_sizes=[1,4,8]"]
)| Format | Description | Use Case |
|---|---|---|
onnx |
Open Neural Network Exchange | Portable, works anywhere |
trt |
TensorRT (FP16 by default) | Maximum NVIDIA GPU performance |
torchscript |
PyTorch JIT compiled | PyTorch ecosystem only |
Default: ONNX + TensorRT FP16
For better profiling accuracy, provide a real dataloader:
from torch.utils.data import DataLoader
dataloader = DataLoader(
your_dataset,
batch_size=8,
num_workers=4
)
convert_model(
model=model,
dataloader=dataloader, # Real data instead of dummy
config_path="configs/my_model.yaml"
)Add an optimization section to your config:
# configs/my_model_int8.yaml
model:
name: "my_classifier_int8"
io_spec:
# ... same as above
optimization:
strategy:
_target_: model_conversion.optimization.QuantizationStrategy
quant_cfg_name: "INT8_DEFAULT_CFG" # ModelOpt quantization config
# Calibration data (required for INT8)
calibration_dataloader:
_target_: torch.utils.data.DataLoader
batch_size: 8
num_workers: 4
dataset:
_target_: torchvision.datasets.FakeData
size: 100
image_size: [3, 224, 224]
transform:
_target_: torchvision.transforms.ToTensorINT8 quantization uses NVIDIA ModelOpt for PTQ calibration.
For GPU-accelerated data loading, use DALI instead of a PyTorch DataLoader:
optimization:
strategy:
_target_: model_conversion.optimization.QuantizationStrategy
# DALI pipeline (GPU-accelerated)
dali_pipeline:
_target_: model_conversion.data.dali.CalibrationPipeline
batch_size: 8
image_root: "/path/to/images"
file_list: "/path/to/image_list.txt"output_workspace/
├── my_classifier.nav # Model Navigator package
└── navigator_workspace/
├── model.onnx # ONNX export
├── model.plan # TensorRT engine
└── status.yaml # Conversion results
triton_models/ # After export_to_triton()
└── my_classifier/
├── config.pbtxt # Triton config
└── 1/
└── model.plan # Best performing model
Ensure cuDNN is installed and your environment variables are correctly set. See Environment Setup.
Common error messages:
Could not load library libcudnn_...(Linux)cudnn64_*.dll not found(Windows)
Reduce batch size in your config:
model:
io_spec:
batch_size: 4 # Lower thisEnable debug mode:
debug: true
verbose: trueSee examples/dinov2.py for a complete DINOv2 conversion workflow.
# Run the example
python examples/dinov2.pyMIT License - see LICENSE.md