Skip to content

augustng-dev/univit_library

UniViT - Unified Vision Transformer Library

A comprehensive Python library for NVIDIA Vision Transformer models

Python 3.10+ License: MIT PyTorch 2.0+ Code Style: Black

Table of Contents

Overview

UniViT is a unified, production-ready Python library that provides a clean and extensible interface for NVIDIA Vision Transformer models including FasterViT, RADIO, and C-RADIO. The library is designed with modern software engineering principles, offering a consistent API inspired by Ultralytics YOLO for seamless model loading, inference, and export.

The library abstracts away the complexity of working with different model architectures while maintaining full access to underlying model capabilities. Whether you need feature extraction for foundation models, classification with FasterViT, or high-performance inference with TensorRT, UniViT provides a unified approach to all these tasks.

Key design goals include clean architecture following SOLID principles, comprehensive error handling with meaningful exceptions, production-ready export capabilities, and easy extensibility for custom models and use cases.

Features

Feature Description
Unified API Consistent interface across all model types, similar to Ultralytics YOLO
Auto-detection Automatic model recognition from registry without manual type specification
Multiple Precision Support FP32, FP16, and INT8 quantization with calibration
TensorRT Export High-performance inference engines with dynamic shapes
ONNX Export Standard ONNX format with optimization and validation
Multi-format Loading Support for .pt, .pth, .onnx, and .tensorrt formats
Production Ready Cloud, edge, and desktop deployment optimization
Model Optimization Pruning, mixed precision, and quantization support
Comprehensive Utils Memory management, distributed inference, and utilities

Installation

Using uv (Recommended)

The uv package manager is recommended for faster and more reliable installations.

# Create a new virtual environment
uv venv univit-env
source univit-env/bin/activate  # Linux/Mac
# or
.\univit-env\Scripts\activate  # Windows

# Install UniViT
cd univit_library
uv pip install -e .

Using pip

# Install from source
cd univit_library
pip install -e .

# Install with all dependencies
pip install univit[all]

# Install with ONNX support
pip install univit[onnx]

# Install with TensorRT support
pip install univit[tensorrt]

Core Dependencies

  • Python 3.10 or higher
  • PyTorch 2.0 or higher
  • torchvision
  • timm
  • NumPy
  • Pillow
  • requests

Optional Dependencies

  • tensorrt: For TensorRT engine export and inference
  • onnx, onnxruntime: For ONNX export and inference
  • pycuda: For TensorRT CUDA integration

Quick Start

Basic Inference with FasterViT

from univit import FasterViTAdapter

# Initialize and load model
model = FasterViTAdapter("faster_vit_0_224")
model.load()

# Predict on a single image
result = model("path/to/image.jpg")
print(f"Predicted class: {result['predictions'][0][0]}")
print(f"Confidence: {result['probabilities'][0][0]:.4f}")

Feature Extraction with RADIO

from univit import RadioAdapter

# Initialize RADIO foundation model
model = RadioAdapter("radio_v2_5_b")
model.load()

# Extract features
result = model("path/to/image.jpg")
features = result["features"]
print(f"Feature shape: {features.shape}")  # (1, 768)

Context Manager Usage

from univit import FasterViTAdapter

# Automatic resource cleanup
with FasterViTAdapter("faster_vit_0_224") as model:
    model.load()
    result = model("image.jpg")
    # Memory is automatically freed when exiting context

Batch Processing

from univit import FasterViTAdapter

model = FasterViTAdapter("faster_vit_0_224")
model.load()

# Process multiple images efficiently
images = ["img1.jpg", "img2.jpg", "img3.jpg", "img4.jpg"]
results = model.predict_batch(images, batch_size=2)

for i, result in enumerate(results):
    print(f"Image {i}: Class {result['predictions'][0][0]}")

Supported Models

FasterViT Models

FasterViT provides efficient hierarchical vision transformers optimized for real-time inference.

Model Input Size Classes Parameters Speed
faster_vit_0_224 224Ă—224 1000 ~5M Fastest
faster_vit_1_224 224Ă—224 1000 ~8M Fast
faster_vit_2_224 224Ă—224 1000 ~12M Medium
faster_vit_3_224 224Ă—224 1000 ~20M Balanced

RADIO Models

RADIO is a foundation model trained with multiple teacher models for versatile feature extraction.

Model Input Size Output Dim Teachers
radio_v2_1 512Ă—512 512 DFN CLIP, SigLIP, DINOv2
radio_v2_5_b 768Ă—768 768 DFN CLIP, SigLIP, DINOv2, SAM
radio_v2_5_h 1024Ă—1024 1024 DFN CLIP, SigLIP, DINOv2, SAM, Florence2
eradio_v2 512Ă—512 768 Efficient variant

C-RADIO Models

C-RADIO is a commercial variant with enhanced capabilities and NVIDIA Open Model License.

Model Input Size Output Dim License
cradio_v3_b 1024Ă—1024 1024 NVIDIA Open Model License

Examples

Running Examples

# Basic inference examples
python examples/basic_examples.py

# Model export examples
python examples/export_examples.py

# Model optimization examples
python examples/optimization_examples.py

# Deployment examples
python examples/deployment_examples.py

# Advanced examples (video streams, API servers)
python examples/advanced_examples.py

# Custom model registration examples
python examples/custom_model_examples.py

Export to ONNX

from univit import FasterViTAdapter, ONNXExporter, Precision

model = FasterViTAdapter("faster_vit_0_224")
model.load()

# Export to ONNX with FP32 precision
ONNXExporter.export(
    model.model,
    (1, 3, 224, 224),
    "fastervit_fp32.onnx",
    precision=Precision.FP32,
)

# Export with FP16 for smaller file size
ONNXExporter.export(
    model.model,
    (1, 3, 224, 224),
    "fastervit_fp16.onnx",
    precision=Precision.FP16,
)

Export to TensorRT

from univit import FasterViTAdapter, TensorRTExporter, Precision

model = FasterViTAdapter("faster_vit_0_224")
model.load()

# Export with FP16 precision
TensorRTExporter.export(
    model.model,
    (1, 3, 224, 224),
    "fastervit_fp16.engine",
    precision=Precision.FP16,
)

# Export with INT8 precision and calibration
calibration_images = ["calib1.jpg", "calib2.jpg", ...]  # 16+ images recommended
TensorRTExporter.export(
    model.model,
    (1, 3, 224, 224),
    "fastervit_int8.engine",
    precision=Precision.INT8,
    calibration_images=calibration_images,
)

Model Optimization

from univit import FasterViTAdapter

model = FasterViTAdapter("faster_vit_0_224")
model.load()

# Apply pruning (30% sparsity)
model.prune_model(amount=0.3, method="l1")

# Enable mixed precision (FP16)
model.enable_mixed_precision()

# Prepare for Quantization Aware Training
model.apply_qat()

# Run inference
result = model("image.jpg")

Deployment Target Configuration

from univit import DeploymentTarget

# Cloud deployment (A100, FP32, Batch=64)
cloud_config = {
    "precision": DeploymentTarget.CLOUD.get_recommended_precision(),  # fp32
    "batch_size": DeploymentTarget.CLOUD.get_max_batch_size(),  # 64
}

# Desktop deployment (RTX 4090, FP16, Batch=8)
desktop_config = {
    "precision": DeploymentTarget.DESKTOP.get_recommended_precision(),  # fp16
    "batch_size": DeploymentTarget.DESKTOP.get_max_batch_size(),  # 8
}

# Edge deployment (Jetson, INT8, Batch=1)
edge_config = {
    "precision": DeploymentTarget.EDGE.get_recommended_precision(),  # int8
    "batch_size": DeploymentTarget.EDGE.get_max_batch_size(),  # 1
}

Custom Model Registration

from univit import ModelConfig, register_model, FasterViTAdapter

# Create custom model configuration (e.g., fire detection)
fire_config = ModelConfig(
    name="fire_fastervit",
    model_type="fastervit",
    num_classes=2,  # Fire / Non-Fire
    input_size=(224, 224),
    default_params={
        "depths": [2, 3, 6, 5],
        "dim": 64,
        "num_heads": [2, 4, 8, 16],
    },
    metadata={
        "task": "fire_detection",
        "classes": ["fire", "normal"],
    },
)

# Register the model
register_model(fire_config)

# Use the custom model
model = FasterViTAdapter(fire_config)
model.load()

result = model("image.jpg")

API Reference

Config Classes

  • ModelConfig: Base configuration class for all models
  • DeploymentTarget: Enum for deployment target (CLOUD, DESKTOP, EDGE)
  • Precision: Enum for model precision (FP32, FP16, INT8)

Adapter Classes

  • BaseUniViTWrapper: Abstract base class defining the common interface
  • FasterViTAdapter: Adapter for FasterViT classification models
  • RadioAdapter: Unified adapter for RADIO, C-RADIO, and E-RADIO models

Exporter Classes

  • ONNXExporter: Export models to ONNX format with optimization
  • TensorRTExporter: Export models to TensorRT engine format

Utility Functions

  • setup_logger(): Configure structured logging
  • cleanup_memory(): Free GPU memory and Python garbage
  • download_weights(): Download model weights from URL
  • count_parameters(): Count model parameters
  • get_model_size(): Calculate model memory size
  • image_to_tensor(): Convert image to tensor
  • tensor_to_image(): Convert tensor to image
  • validate_image(): Validate image is usable
  • Timer(): Context manager for timing operations
  • DistributedManager(): Manager for distributed inference

Deployment

Cloud Deployment

Optimized for high-performance GPU servers with large batch sizes.

Parameter Value
Precision FP32
Batch Size Up to 64
Device A100, V100, H100
Use Case Server inference, training pipelines

Desktop Deployment

Optimized for consumer GPUs with balanced speed and accuracy.

Parameter Value
Precision FP16
Batch Size Up to 8
Device RTX 3090, RTX 4090
Use Case Development, prototyping, local inference

Edge Deployment

Optimized for embedded devices with strict resource constraints.

Parameter Value
Precision INT8
Batch Size 1
Device Jetson, TensorRT-LLM
Use Case Robotics, IoT devices, autonomous vehicles

Extending the Library

Creating a Custom Adapter

To add support for a new model type, create a new adapter class that inherits from BaseUniViTWrapper:

from univit import BaseUniViTWrapper, ModelConfig

class CustomModelAdapter(BaseUniViTWrapper):
    """Adapter for custom model"""
    
    def __init__(self, config: Union[ModelConfig, str], device: str = "auto"):
        super().__init__(config, device)
    
    def _build_model(self) -> nn.Module:
        """Build the model architecture"""
        # Implement model construction
        pass
    
    def _load_weights(self, weights_path: Optional[str] = None):
        """Load pretrained weights"""
        # Implement weight loading
        pass
    
    def _preprocess(self, image, target_size: int) -> torch.Tensor:
        """Preprocess input image"""
        # Implement preprocessing
        pass
    
    def _postprocess(self, output: torch.Tensor) -> Dict:
        """Postprocess model output"""
        # Implement postprocessing
        pass

Registering Custom Models

from univit import ModelConfig, register_model

# Define configuration
custom_config = ModelConfig(
    name="custom_model",
    model_type="custom",
    version="1.0.0",
    num_classes=10,
    input_size=(224, 224),
    # Add other parameters as needed
)

# Register the model
register_model(custom_config)

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting pull requests.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

The models supported by this library have their own licenses:

  • FasterViT: Licensed under the FasterViT license
  • RADIO: Licensed under the RADIO license
  • C-RADIO: Licensed under the NVIDIA Open Model License

Please refer to the respective model repositories for detailed licensing information.

Acknowledgments


Built with care by the UniViT Team

About

No description, website, or topics provided.

Resources

License

MIT and 2 other licenses found

Licenses found

MIT
LICENSE
Apache-2.0
LICENSE_APACHE
BSD-3-Clause
LICENSE_BSD

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages