UniViT - Unified Vision Transformer Library

A comprehensive Python library for NVIDIA Vision Transformer models

Overview

UniViT is a unified, production-ready Python library that provides a clean and extensible interface for NVIDIA Vision Transformer models including FasterViT, RADIO, and C-RADIO. The library is designed with modern software engineering principles, offering a consistent API inspired by Ultralytics YOLO for seamless model loading, inference, and export.

The library abstracts away the complexity of working with different model architectures while maintaining full access to underlying model capabilities. Whether you need feature extraction for foundation models, classification with FasterViT, or high-performance inference with TensorRT, UniViT provides a unified approach to all these tasks.

Key design goals include clean architecture following SOLID principles, comprehensive error handling with meaningful exceptions, production-ready export capabilities, and easy extensibility for custom models and use cases.

Features

Feature	Description
Unified API	Consistent interface across all model types, similar to Ultralytics YOLO
Auto-detection	Automatic model recognition from registry without manual type specification
Multiple Precision Support	FP32, FP16, and INT8 quantization with calibration
TensorRT Export	High-performance inference engines with dynamic shapes
ONNX Export	Standard ONNX format with optimization and validation
Multi-format Loading	Support for .pt, .pth, .onnx, and .tensorrt formats
Production Ready	Cloud, edge, and desktop deployment optimization
Model Optimization	Pruning, mixed precision, and quantization support
Comprehensive Utils	Memory management, distributed inference, and utilities

Installation

Using uv (Recommended)

The uv package manager is recommended for faster and more reliable installations.

# Create a new virtual environment
uv venv univit-env
source univit-env/bin/activate  # Linux/Mac
# or
.\univit-env\Scripts\activate  # Windows

# Install UniViT
cd univit_library
uv pip install -e .

Using pip

# Install from source
cd univit_library
pip install -e .

# Install with all dependencies
pip install univit[all]

# Install with ONNX support
pip install univit[onnx]

# Install with TensorRT support
pip install univit[tensorrt]

Core Dependencies

Python 3.10 or higher
PyTorch 2.0 or higher
torchvision
timm
NumPy
Pillow
requests

Optional Dependencies

tensorrt: For TensorRT engine export and inference
onnx, onnxruntime: For ONNX export and inference
pycuda: For TensorRT CUDA integration

Quick Start

Basic Inference with FasterViT

from univit import FasterViTAdapter

# Initialize and load model
model = FasterViTAdapter("faster_vit_0_224")
model.load()

# Predict on a single image
result = model("path/to/image.jpg")
print(f"Predicted class: {result['predictions'][0][0]}")
print(f"Confidence: {result['probabilities'][0][0]:.4f}")

Feature Extraction with RADIO

from univit import RadioAdapter

# Initialize RADIO foundation model
model = RadioAdapter("radio_v2_5_b")
model.load()

# Extract features
result = model("path/to/image.jpg")
features = result["features"]
print(f"Feature shape: {features.shape}")  # (1, 768)

Context Manager Usage

from univit import FasterViTAdapter

# Automatic resource cleanup
with FasterViTAdapter("faster_vit_0_224") as model:
    model.load()
    result = model("image.jpg")
    # Memory is automatically freed when exiting context

Batch Processing

from univit import FasterViTAdapter

model = FasterViTAdapter("faster_vit_0_224")
model.load()

# Process multiple images efficiently
images = ["img1.jpg", "img2.jpg", "img3.jpg", "img4.jpg"]
results = model.predict_batch(images, batch_size=2)

for i, result in enumerate(results):
    print(f"Image {i}: Class {result['predictions'][0][0]}")

Supported Models

FasterViT Models

FasterViT provides efficient hierarchical vision transformers optimized for real-time inference.

Model	Input Size	Classes	Parameters	Speed
faster_vit_0_224	224×224	1000	~5M	Fastest
faster_vit_1_224	224×224	1000	~8M	Fast
faster_vit_2_224	224×224	1000	~12M	Medium
faster_vit_3_224	224×224	1000	~20M	Balanced

RADIO Models

RADIO is a foundation model trained with multiple teacher models for versatile feature extraction.

Model	Input Size	Output Dim	Teachers
radio_v2_1	512×512	512	DFN CLIP, SigLIP, DINOv2
radio_v2_5_b	768×768	768	DFN CLIP, SigLIP, DINOv2, SAM
radio_v2_5_h	1024×1024	1024	DFN CLIP, SigLIP, DINOv2, SAM, Florence2
eradio_v2	512×512	768	Efficient variant

C-RADIO Models

C-RADIO is a commercial variant with enhanced capabilities and NVIDIA Open Model License.

Model	Input Size	Output Dim	License
cradio_v3_b	1024×1024	1024	NVIDIA Open Model License

Examples

Running Examples

# Basic inference examples
python examples/basic_examples.py

# Model export examples
python examples/export_examples.py

# Model optimization examples
python examples/optimization_examples.py

# Deployment examples
python examples/deployment_examples.py

# Advanced examples (video streams, API servers)
python examples/advanced_examples.py

# Custom model registration examples
python examples/custom_model_examples.py

Export to ONNX

from univit import FasterViTAdapter, ONNXExporter, Precision

model = FasterViTAdapter("faster_vit_0_224")
model.load()

# Export to ONNX with FP32 precision
ONNXExporter.export(
    model.model,
    (1, 3, 224, 224),
    "fastervit_fp32.onnx",
    precision=Precision.FP32,
)

# Export with FP16 for smaller file size
ONNXExporter.export(
    model.model,
    (1, 3, 224, 224),
    "fastervit_fp16.onnx",
    precision=Precision.FP16,
)

Export to TensorRT

from univit import FasterViTAdapter, TensorRTExporter, Precision

model = FasterViTAdapter("faster_vit_0_224")
model.load()

# Export with FP16 precision
TensorRTExporter.export(
    model.model,
    (1, 3, 224, 224),
    "fastervit_fp16.engine",
    precision=Precision.FP16,
)

# Export with INT8 precision and calibration
calibration_images = ["calib1.jpg", "calib2.jpg", ...]  # 16+ images recommended
TensorRTExporter.export(
    model.model,
    (1, 3, 224, 224),
    "fastervit_int8.engine",
    precision=Precision.INT8,
    calibration_images=calibration_images,
)

Model Optimization

from univit import FasterViTAdapter

model = FasterViTAdapter("faster_vit_0_224")
model.load()

# Apply pruning (30% sparsity)
model.prune_model(amount=0.3, method="l1")

# Enable mixed precision (FP16)
model.enable_mixed_precision()

# Prepare for Quantization Aware Training
model.apply_qat()

# Run inference
result = model("image.jpg")

Deployment Target Configuration

from univit import DeploymentTarget

# Cloud deployment (A100, FP32, Batch=64)
cloud_config = {
    "precision": DeploymentTarget.CLOUD.get_recommended_precision(),  # fp32
    "batch_size": DeploymentTarget.CLOUD.get_max_batch_size(),  # 64
}

# Desktop deployment (RTX 4090, FP16, Batch=8)
desktop_config = {
    "precision": DeploymentTarget.DESKTOP.get_recommended_precision(),  # fp16
    "batch_size": DeploymentTarget.DESKTOP.get_max_batch_size(),  # 8
}

# Edge deployment (Jetson, INT8, Batch=1)
edge_config = {
    "precision": DeploymentTarget.EDGE.get_recommended_precision(),  # int8
    "batch_size": DeploymentTarget.EDGE.get_max_batch_size(),  # 1
}

Custom Model Registration

from univit import ModelConfig, register_model, FasterViTAdapter

# Create custom model configuration (e.g., fire detection)
fire_config = ModelConfig(
    name="fire_fastervit",
    model_type="fastervit",
    num_classes=2,  # Fire / Non-Fire
    input_size=(224, 224),
    default_params={
        "depths": [2, 3, 6, 5],
        "dim": 64,
        "num_heads": [2, 4, 8, 16],
    },
    metadata={
        "task": "fire_detection",
        "classes": ["fire", "normal"],
    },
)

# Register the model
register_model(fire_config)

# Use the custom model
model = FasterViTAdapter(fire_config)
model.load()

result = model("image.jpg")

API Reference

Config Classes

ModelConfig: Base configuration class for all models
DeploymentTarget: Enum for deployment target (CLOUD, DESKTOP, EDGE)
Precision: Enum for model precision (FP32, FP16, INT8)

Adapter Classes

BaseUniViTWrapper: Abstract base class defining the common interface
FasterViTAdapter: Adapter for FasterViT classification models
RadioAdapter: Unified adapter for RADIO, C-RADIO, and E-RADIO models

Exporter Classes

ONNXExporter: Export models to ONNX format with optimization
TensorRTExporter: Export models to TensorRT engine format

Utility Functions

setup_logger(): Configure structured logging
cleanup_memory(): Free GPU memory and Python garbage
download_weights(): Download model weights from URL
count_parameters(): Count model parameters
get_model_size(): Calculate model memory size
image_to_tensor(): Convert image to tensor
tensor_to_image(): Convert tensor to image
validate_image(): Validate image is usable
Timer(): Context manager for timing operations
DistributedManager(): Manager for distributed inference

Deployment

Cloud Deployment

Optimized for high-performance GPU servers with large batch sizes.

Parameter	Value
Precision	FP32
Batch Size	Up to 64
Device	A100, V100, H100
Use Case	Server inference, training pipelines

Desktop Deployment

Optimized for consumer GPUs with balanced speed and accuracy.

Parameter	Value
Precision	FP16
Batch Size	Up to 8
Device	RTX 3090, RTX 4090
Use Case	Development, prototyping, local inference

Edge Deployment

Optimized for embedded devices with strict resource constraints.

Parameter	Value
Precision	INT8
Batch Size	1
Device	Jetson, TensorRT-LLM
Use Case	Robotics, IoT devices, autonomous vehicles

Extending the Library

Creating a Custom Adapter

To add support for a new model type, create a new adapter class that inherits from BaseUniViTWrapper:

from univit import BaseUniViTWrapper, ModelConfig

class CustomModelAdapter(BaseUniViTWrapper):
    """Adapter for custom model"""
    
    def __init__(self, config: Union[ModelConfig, str], device: str = "auto"):
        super().__init__(config, device)
    
    def _build_model(self) -> nn.Module:
        """Build the model architecture"""
        # Implement model construction
        pass
    
    def _load_weights(self, weights_path: Optional[str] = None):
        """Load pretrained weights"""
        # Implement weight loading
        pass
    
    def _preprocess(self, image, target_size: int) -> torch.Tensor:
        """Preprocess input image"""
        # Implement preprocessing
        pass
    
    def _postprocess(self, output: torch.Tensor) -> Dict:
        """Postprocess model output"""
        # Implement postprocessing
        pass

Registering Custom Models

from univit import ModelConfig, register_model

# Define configuration
custom_config = ModelConfig(
    name="custom_model",
    model_type="custom",
    version="1.0.0",
    num_classes=10,
    input_size=(224, 224),
    # Add other parameters as needed
)

# Register the model
register_model(custom_config)

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting pull requests.

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

The models supported by this library have their own licenses:

FasterViT: Licensed under the FasterViT license
RADIO: Licensed under the RADIO license
C-RADIO: Licensed under the NVIDIA Open Model License

Please refer to the respective model repositories for detailed licensing information.

Acknowledgments

Built with care by the UniViT Team

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
src/univit		src/univit
tests		tests
COMPLETION_SUMMARY.md		COMPLETION_SUMMARY.md
DEVELOPMENT_SUMMARY.md		DEVELOPMENT_SUMMARY.md
LICENSE		LICENSE
LICENSE_APACHE		LICENSE_APACHE
LICENSE_BSD		LICENSE_BSD
README.md		README.md
README_VI.md		README_VI.md
TODO.md		TODO.md
pyproject.toml		pyproject.toml
verify.py		verify.py

License

Licenses found

augustng-dev/univit_library

Folders and files

Latest commit

History

Repository files navigation

UniViT - Unified Vision Transformer Library

Table of Contents

Overview

Features

Installation

Using uv (Recommended)

Using pip

Core Dependencies

Optional Dependencies

Quick Start

Basic Inference with FasterViT

Feature Extraction with RADIO

Context Manager Usage

Batch Processing

Supported Models

FasterViT Models

RADIO Models

C-RADIO Models

Examples

Running Examples

Export to ONNX

Export to TensorRT

Model Optimization

Deployment Target Configuration

Custom Model Registration

API Reference

Config Classes

Adapter Classes

Exporter Classes

Utility Functions

Deployment

Cloud Deployment

Desktop Deployment

Edge Deployment

Extending the Library

Creating a Custom Adapter

Registering Custom Models

Contributing

License

Acknowledgments

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages