Inferox

Safe, Fast, and Modular ML Inference Engine for Rust

Overview

Inferox is a high-performance ML inference engine built in Rust, designed with a two-pillar architecture that separates model compilation from runtime execution. Compile your model architectures into shared libraries (.so/.dylib) and load them dynamically into the engine with complete type safety.

Key Features

🔒 Type-Safe Dynamic Loading: Load models as trait objects - no manual FFI required
🚀 Multiple Backend Support: Candle backend implemented, extensible for ONNX, TensorFlow, etc.
🎯 Zero-Copy Inference: Efficient tensor operations without unnecessary allocations
🔧 Hot Reloadable: Swap model libraries without recompiling the engine
🦀 Pure Rust: Memory safety and RAII throughout, minimal unsafe confined to libloading

Architecture

┌─────────────────────────────────────────────┐
│  Model Library (libmlp_classifier.dylib)    │
│  ┌──────────────────────────────────────┐   │
│  │ #[no_mangle]                         │   │
│  │ pub fn create_model()                │   │
│  │   -> Box<dyn Model>                  │   │
│  └──────────────────────────────────────┘   │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│  Engine (loads via libloading)              │
│  ┌──────────────────────────────────────┐   │
│  │ let lib = Library::new(path)         │   │
│  │ let factory = lib.get("create_model")│   │
│  │ let model: Box<dyn Model> = factory()│   │
│  │ engine.register_boxed_model(model)   │   │
│  └──────────────────────────────────────┘   │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│  InferoxEngine manages all models           │
│  - Type-safe trait interface                │
│  - RAII memory management                   │
│  - No unsafe in user code                   │
└─────────────────────────────────────────────┘

Quick Start

1. Define Your Model Architecture

use inferox_core::{Model, ModelMetadata, InferoxError};
use inferox_candle::{CandleBackend, CandleTensor};
use candle_nn::{Linear, VarBuilder};

pub struct MLP {
    fc1: Linear,
    fc2: Linear,
    fc3: Linear,
    name: String,
}

impl Model for MLP {
    type Backend = CandleBackend;
    type Input = CandleTensor;
    type Output = CandleTensor;
    
    fn forward(&self, input: Self::Input) -> Result<Self::Output, InferoxError> {
        let x = self.fc1.forward(&input.inner())?;
        let x = x.relu()?;
        let x = self.fc2.forward(&x)?;
        let x = x.relu()?;
        let x = self.fc3.forward(&x)?;
        Ok(CandleTensor::new(x))
    }
    
    fn name(&self) -> &str {
        &self.name
    }
    
    fn metadata(&self) -> ModelMetadata {
        ModelMetadata::new("mlp", "1.0.0")
            .with_description("Multi-Layer Perceptron")
    }
}

2. Compile to Shared Library

Create a library crate with crate-type = ["cdylib"]:

// models/classifier/src/lib.rs
use inferox_candle::{CandleBackend, CandleModelBuilder, CandleTensor};
use inferox_core::Model;
use candle_core::Device;

#[no_mangle]
pub fn create_model() -> Box<dyn Model<Backend = CandleBackend, Input = CandleTensor, Output = CandleTensor>> {
    let builder = CandleModelBuilder::new(Device::Cpu);
    let model = MLP::new("classifier", 10, 8, 3, builder.var_builder())
        .expect("Failed to create classifier model");
    Box::new(model)
}

Build the model:

cargo build --release -p mlp-classifier

3. Load and Run in Engine

use inferox_engine::{InferoxEngine, EngineConfig};
use inferox_candle::CandleBackend;
use libloading::{Library, Symbol};

type ModelFactory = fn() -> Box<dyn Model<Backend = CandleBackend, Input = CandleTensor, Output = CandleTensor>>;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let backend = CandleBackend::cpu();
    let config = EngineConfig::default();
    let mut engine = InferoxEngine::new(backend.clone(), config);
    
    // Load model from shared library
    unsafe {
        let lib = Library::new("target/release/libmlp_classifier.dylib")?;
        let factory: Symbol<ModelFactory> = lib.get(b"create_model")?;
        let model = factory();
        engine.register_boxed_model(model);
        std::mem::forget(lib);
    }
    
    // Run inference
    let input = backend.tensor_builder().build_from_vec(
        vec![0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
        &[1, 10]
    )?;
    
    let output = engine.infer("classifier", input)?;
    println!("Output: {:?}", output.to_vec2::<f32>()?);
    
    Ok(())
}

Project Structure

inferox/
├── crates/
│   ├── inferox-core/        # Core traits and types
│   ├── inferox-candle/      # Candle backend implementation
│   └── inferox-engine/      # Inference engine runtime
├── examples/
│   └── mlp/                 # MLP example with dynamic loading
│       ├── src/
│       │   ├── lib.rs       # MLP architecture
│       │   └── main.rs      # Engine runtime
│       └── models/
│           ├── classifier/  # Compiled to .dylib/.so
│           └── small/       # Compiled to .dylib/.so
├── Makefile                 # Development commands
└── .github/workflows/       # CI/CD pipelines

Core Components

`inferox-core`

Core trait definitions for backends, tensors, and models:

Backend - Hardware abstraction (CPU, CUDA, Metal, etc.)
Tensor - N-dimensional array operations
Model - Model trait with forward pass, metadata, and state management
DataType - Numeric type system with safe conversions

`inferox-candle`

Candle backend implementation using Hugging Face's Candle:

CandleBackend - Backend for Candle tensors
CandleTensor - Tensor wrapper with type-safe operations
CandleModelBuilder - Model initialization with weight loading
CandleVarMap - Weight management and serialization

`inferox-engine`

High-level inference engine with model management:

InferoxEngine - Multi-model inference orchestration
InferenceSession - Stateful inference with context
EngineConfig - Runtime configuration (batch size, device, etc.)
Dynamic model loading via trait objects

Examples

MLP Example

A complete example demonstrating the two-pillar architecture:

# Build model libraries
make models

# Run the engine with multiple models
cargo run --bin mlp --release -- \
  target/release/libmlp_classifier.dylib \
  target/release/libmlp_small.dylib

Output:

Inferox MLP Engine
==================

✓ Created CPU backend

Loading model from: target/release/libmlp_classifier.dylib
✓ Registered 'classifier' - Multi-Layer Perceptron (10 → 8 → 8 → 3)

Loading model from: target/release/libmlp_small.dylib
✓ Registered 'small' - Multi-Layer Perceptron (5 → 4 → 4 → 2)

2 models loaded

Available models:
  - classifier v1.0.0: Multi-Layer Perceptron (10 → 8 → 8 → 3)
  - small v1.0.0: Multi-Layer Perceptron (5 → 4 → 4 → 2)

Running test inference on all models:
  classifier -> output shape: [1, 3]
  small -> output shape: [1, 2]

✓ All models working!

See examples/mlp/README.md for detailed documentation.

Development

Prerequisites

Rust 1.70+ (2021 edition)
Cargo

Building

# Build all crates
make build

# Build in release mode
make build-release

# Build model libraries
make models

# Build examples
make examples

Testing

# Run tests + quick lint (recommended)
make test

# Run tests only
make test-quick

# Run specific crate tests
make test-core
make test-candle
make test-engine

Linting and Formatting

# Format code
make format

# Run clippy linter
make lint

# Quick pre-commit checks
make pre-commit

Documentation

# Generate and open docs
make doc

# Generate docs including private items
make doc-private

CI/CD

The project uses GitHub Actions for continuous integration:

Format Check: Ensures code follows rustfmt standards
Clippy Lint: Catches common mistakes and anti-patterns
Test Suite: Runs on Ubuntu and macOS with stable Rust
Model Libraries: Verifies model binaries build correctly
Documentation: Ensures docs build without warnings
Examples: Validates all examples compile and run

See .github/workflows/pr-checks.yml for the complete pipeline.

Roadmap

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Run make pre-commit before committing
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
crates		crates
examples		examples
rfcs		rfcs
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Makefile		Makefile
PUBLISHING.md		PUBLISHING.md
README.md		README.md
RELEASE.md		RELEASE.md
RELEASE_QUICK.md		RELEASE_QUICK.md
cliff.toml		cliff.toml
release.toml		release.toml
tarpaulin.toml		tarpaulin.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Inferox

Overview

Key Features

Architecture

Quick Start

1. Define Your Model Architecture

2. Compile to Shared Library

3. Load and Run in Engine

Project Structure

Core Components

`inferox-core`

`inferox-candle`

`inferox-engine`

Examples

MLP Example

Development

Prerequisites

Building

Testing

Linting and Formatting

Documentation

CI/CD

Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

jsam/inferox

Folders and files

Latest commit

History

Repository files navigation

Inferox

Overview

Key Features

Architecture

Quick Start

1. Define Your Model Architecture

2. Compile to Shared Library

3. Load and Run in Engine

Project Structure

Core Components

inferox-core

inferox-candle

inferox-engine

Examples

MLP Example

Development

Prerequisites

Building

Testing

Linting and Formatting

Documentation

CI/CD

Roadmap

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`inferox-core`

`inferox-candle`

`inferox-engine`

Packages