- Overview
- Features
- Installation
- Quick Start
- Supported Models
- Examples
- API Reference
- Deployment
- Extending the Library
- Contributing
- License
UniViT is a unified, production-ready Python library that provides a clean and extensible interface for NVIDIA Vision Transformer models including FasterViT, RADIO, and C-RADIO. The library is designed with modern software engineering principles, offering a consistent API inspired by Ultralytics YOLO for seamless model loading, inference, and export.
The library abstracts away the complexity of working with different model architectures while maintaining full access to underlying model capabilities. Whether you need feature extraction for foundation models, classification with FasterViT, or high-performance inference with TensorRT, UniViT provides a unified approach to all these tasks.
Key design goals include clean architecture following SOLID principles, comprehensive error handling with meaningful exceptions, production-ready export capabilities, and easy extensibility for custom models and use cases.
| Feature | Description |
|---|---|
| Unified API | Consistent interface across all model types, similar to Ultralytics YOLO |
| Auto-detection | Automatic model recognition from registry without manual type specification |
| Multiple Precision Support | FP32, FP16, and INT8 quantization with calibration |
| TensorRT Export | High-performance inference engines with dynamic shapes |
| ONNX Export | Standard ONNX format with optimization and validation |
| Multi-format Loading | Support for .pt, .pth, .onnx, and .tensorrt formats |
| Production Ready | Cloud, edge, and desktop deployment optimization |
| Model Optimization | Pruning, mixed precision, and quantization support |
| Comprehensive Utils | Memory management, distributed inference, and utilities |
The uv package manager is recommended for faster and more reliable installations.
# Create a new virtual environment
uv venv univit-env
source univit-env/bin/activate # Linux/Mac
# or
.\univit-env\Scripts\activate # Windows
# Install UniViT
cd univit_library
uv pip install -e .# Install from source
cd univit_library
pip install -e .
# Install with all dependencies
pip install univit[all]
# Install with ONNX support
pip install univit[onnx]
# Install with TensorRT support
pip install univit[tensorrt]- Python 3.10 or higher
- PyTorch 2.0 or higher
- torchvision
- timm
- NumPy
- Pillow
- requests
- tensorrt: For TensorRT engine export and inference
- onnx, onnxruntime: For ONNX export and inference
- pycuda: For TensorRT CUDA integration
from univit import FasterViTAdapter
# Initialize and load model
model = FasterViTAdapter("faster_vit_0_224")
model.load()
# Predict on a single image
result = model("path/to/image.jpg")
print(f"Predicted class: {result['predictions'][0][0]}")
print(f"Confidence: {result['probabilities'][0][0]:.4f}")from univit import RadioAdapter
# Initialize RADIO foundation model
model = RadioAdapter("radio_v2_5_b")
model.load()
# Extract features
result = model("path/to/image.jpg")
features = result["features"]
print(f"Feature shape: {features.shape}") # (1, 768)from univit import FasterViTAdapter
# Automatic resource cleanup
with FasterViTAdapter("faster_vit_0_224") as model:
model.load()
result = model("image.jpg")
# Memory is automatically freed when exiting contextfrom univit import FasterViTAdapter
model = FasterViTAdapter("faster_vit_0_224")
model.load()
# Process multiple images efficiently
images = ["img1.jpg", "img2.jpg", "img3.jpg", "img4.jpg"]
results = model.predict_batch(images, batch_size=2)
for i, result in enumerate(results):
print(f"Image {i}: Class {result['predictions'][0][0]}")FasterViT provides efficient hierarchical vision transformers optimized for real-time inference.
| Model | Input Size | Classes | Parameters | Speed |
|---|---|---|---|---|
| faster_vit_0_224 | 224Ă—224 | 1000 | ~5M | Fastest |
| faster_vit_1_224 | 224Ă—224 | 1000 | ~8M | Fast |
| faster_vit_2_224 | 224Ă—224 | 1000 | ~12M | Medium |
| faster_vit_3_224 | 224Ă—224 | 1000 | ~20M | Balanced |
RADIO is a foundation model trained with multiple teacher models for versatile feature extraction.
| Model | Input Size | Output Dim | Teachers |
|---|---|---|---|
| radio_v2_1 | 512Ă—512 | 512 | DFN CLIP, SigLIP, DINOv2 |
| radio_v2_5_b | 768Ă—768 | 768 | DFN CLIP, SigLIP, DINOv2, SAM |
| radio_v2_5_h | 1024Ă—1024 | 1024 | DFN CLIP, SigLIP, DINOv2, SAM, Florence2 |
| eradio_v2 | 512Ă—512 | 768 | Efficient variant |
C-RADIO is a commercial variant with enhanced capabilities and NVIDIA Open Model License.
| Model | Input Size | Output Dim | License |
|---|---|---|---|
| cradio_v3_b | 1024Ă—1024 | 1024 | NVIDIA Open Model License |
# Basic inference examples
python examples/basic_examples.py
# Model export examples
python examples/export_examples.py
# Model optimization examples
python examples/optimization_examples.py
# Deployment examples
python examples/deployment_examples.py
# Advanced examples (video streams, API servers)
python examples/advanced_examples.py
# Custom model registration examples
python examples/custom_model_examples.pyfrom univit import FasterViTAdapter, ONNXExporter, Precision
model = FasterViTAdapter("faster_vit_0_224")
model.load()
# Export to ONNX with FP32 precision
ONNXExporter.export(
model.model,
(1, 3, 224, 224),
"fastervit_fp32.onnx",
precision=Precision.FP32,
)
# Export with FP16 for smaller file size
ONNXExporter.export(
model.model,
(1, 3, 224, 224),
"fastervit_fp16.onnx",
precision=Precision.FP16,
)from univit import FasterViTAdapter, TensorRTExporter, Precision
model = FasterViTAdapter("faster_vit_0_224")
model.load()
# Export with FP16 precision
TensorRTExporter.export(
model.model,
(1, 3, 224, 224),
"fastervit_fp16.engine",
precision=Precision.FP16,
)
# Export with INT8 precision and calibration
calibration_images = ["calib1.jpg", "calib2.jpg", ...] # 16+ images recommended
TensorRTExporter.export(
model.model,
(1, 3, 224, 224),
"fastervit_int8.engine",
precision=Precision.INT8,
calibration_images=calibration_images,
)from univit import FasterViTAdapter
model = FasterViTAdapter("faster_vit_0_224")
model.load()
# Apply pruning (30% sparsity)
model.prune_model(amount=0.3, method="l1")
# Enable mixed precision (FP16)
model.enable_mixed_precision()
# Prepare for Quantization Aware Training
model.apply_qat()
# Run inference
result = model("image.jpg")from univit import DeploymentTarget
# Cloud deployment (A100, FP32, Batch=64)
cloud_config = {
"precision": DeploymentTarget.CLOUD.get_recommended_precision(), # fp32
"batch_size": DeploymentTarget.CLOUD.get_max_batch_size(), # 64
}
# Desktop deployment (RTX 4090, FP16, Batch=8)
desktop_config = {
"precision": DeploymentTarget.DESKTOP.get_recommended_precision(), # fp16
"batch_size": DeploymentTarget.DESKTOP.get_max_batch_size(), # 8
}
# Edge deployment (Jetson, INT8, Batch=1)
edge_config = {
"precision": DeploymentTarget.EDGE.get_recommended_precision(), # int8
"batch_size": DeploymentTarget.EDGE.get_max_batch_size(), # 1
}from univit import ModelConfig, register_model, FasterViTAdapter
# Create custom model configuration (e.g., fire detection)
fire_config = ModelConfig(
name="fire_fastervit",
model_type="fastervit",
num_classes=2, # Fire / Non-Fire
input_size=(224, 224),
default_params={
"depths": [2, 3, 6, 5],
"dim": 64,
"num_heads": [2, 4, 8, 16],
},
metadata={
"task": "fire_detection",
"classes": ["fire", "normal"],
},
)
# Register the model
register_model(fire_config)
# Use the custom model
model = FasterViTAdapter(fire_config)
model.load()
result = model("image.jpg")ModelConfig: Base configuration class for all modelsDeploymentTarget: Enum for deployment target (CLOUD, DESKTOP, EDGE)Precision: Enum for model precision (FP32, FP16, INT8)
BaseUniViTWrapper: Abstract base class defining the common interfaceFasterViTAdapter: Adapter for FasterViT classification modelsRadioAdapter: Unified adapter for RADIO, C-RADIO, and E-RADIO models
ONNXExporter: Export models to ONNX format with optimizationTensorRTExporter: Export models to TensorRT engine format
setup_logger(): Configure structured loggingcleanup_memory(): Free GPU memory and Python garbagedownload_weights(): Download model weights from URLcount_parameters(): Count model parametersget_model_size(): Calculate model memory sizeimage_to_tensor(): Convert image to tensortensor_to_image(): Convert tensor to imagevalidate_image(): Validate image is usableTimer(): Context manager for timing operationsDistributedManager(): Manager for distributed inference
Optimized for high-performance GPU servers with large batch sizes.
| Parameter | Value |
|---|---|
| Precision | FP32 |
| Batch Size | Up to 64 |
| Device | A100, V100, H100 |
| Use Case | Server inference, training pipelines |
Optimized for consumer GPUs with balanced speed and accuracy.
| Parameter | Value |
|---|---|
| Precision | FP16 |
| Batch Size | Up to 8 |
| Device | RTX 3090, RTX 4090 |
| Use Case | Development, prototyping, local inference |
Optimized for embedded devices with strict resource constraints.
| Parameter | Value |
|---|---|
| Precision | INT8 |
| Batch Size | 1 |
| Device | Jetson, TensorRT-LLM |
| Use Case | Robotics, IoT devices, autonomous vehicles |
To add support for a new model type, create a new adapter class that inherits from BaseUniViTWrapper:
from univit import BaseUniViTWrapper, ModelConfig
class CustomModelAdapter(BaseUniViTWrapper):
"""Adapter for custom model"""
def __init__(self, config: Union[ModelConfig, str], device: str = "auto"):
super().__init__(config, device)
def _build_model(self) -> nn.Module:
"""Build the model architecture"""
# Implement model construction
pass
def _load_weights(self, weights_path: Optional[str] = None):
"""Load pretrained weights"""
# Implement weight loading
pass
def _preprocess(self, image, target_size: int) -> torch.Tensor:
"""Preprocess input image"""
# Implement preprocessing
pass
def _postprocess(self, output: torch.Tensor) -> Dict:
"""Postprocess model output"""
# Implement postprocessing
passfrom univit import ModelConfig, register_model
# Define configuration
custom_config = ModelConfig(
name="custom_model",
model_type="custom",
version="1.0.0",
num_classes=10,
input_size=(224, 224),
# Add other parameters as needed
)
# Register the model
register_model(custom_config)Contributions are welcome! Please read our contributing guidelines before submitting pull requests.
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
The models supported by this library have their own licenses:
- FasterViT: Licensed under the FasterViT license
- RADIO: Licensed under the RADIO license
- C-RADIO: Licensed under the NVIDIA Open Model License
Please refer to the respective model repositories for detailed licensing information.
Built with care by the UniViT Team