Inferox is a high-performance ML inference engine built in Rust, designed with a two-pillar architecture that separates model compilation from runtime execution. Compile your model architectures into shared libraries (.so
/.dylib
) and load them dynamically into the engine with complete type safety.
- 🔒 Type-Safe Dynamic Loading: Load models as trait objects - no manual FFI required
- 🚀 Multiple Backend Support: Candle backend implemented, extensible for ONNX, TensorFlow, etc.
- 🎯 Zero-Copy Inference: Efficient tensor operations without unnecessary allocations
- 🔧 Hot Reloadable: Swap model libraries without recompiling the engine
- 🦀 Pure Rust: Memory safety and RAII throughout, minimal
unsafe
confined tolibloading
┌─────────────────────────────────────────────┐
│ Model Library (libmlp_classifier.dylib) │
│ ┌──────────────────────────────────────┐ │
│ │ #[no_mangle] │ │
│ │ pub fn create_model() │ │
│ │ -> Box<dyn Model> │ │
│ └──────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ Engine (loads via libloading) │
│ ┌──────────────────────────────────────┐ │
│ │ let lib = Library::new(path) │ │
│ │ let factory = lib.get("create_model")│ │
│ │ let model: Box<dyn Model> = factory()│ │
│ │ engine.register_boxed_model(model) │ │
│ └──────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ InferoxEngine manages all models │
│ - Type-safe trait interface │
│ - RAII memory management │
│ - No unsafe in user code │
└─────────────────────────────────────────────┘
use inferox_core::{Model, ModelMetadata, InferoxError};
use inferox_candle::{CandleBackend, CandleTensor};
use candle_nn::{Linear, VarBuilder};
pub struct MLP {
fc1: Linear,
fc2: Linear,
fc3: Linear,
name: String,
}
impl Model for MLP {
type Backend = CandleBackend;
type Input = CandleTensor;
type Output = CandleTensor;
fn forward(&self, input: Self::Input) -> Result<Self::Output, InferoxError> {
let x = self.fc1.forward(&input.inner())?;
let x = x.relu()?;
let x = self.fc2.forward(&x)?;
let x = x.relu()?;
let x = self.fc3.forward(&x)?;
Ok(CandleTensor::new(x))
}
fn name(&self) -> &str {
&self.name
}
fn metadata(&self) -> ModelMetadata {
ModelMetadata::new("mlp", "1.0.0")
.with_description("Multi-Layer Perceptron")
}
}
Create a library crate with crate-type = ["cdylib"]
:
// models/classifier/src/lib.rs
use inferox_candle::{CandleBackend, CandleModelBuilder, CandleTensor};
use inferox_core::Model;
use candle_core::Device;
#[no_mangle]
pub fn create_model() -> Box<dyn Model<Backend = CandleBackend, Input = CandleTensor, Output = CandleTensor>> {
let builder = CandleModelBuilder::new(Device::Cpu);
let model = MLP::new("classifier", 10, 8, 3, builder.var_builder())
.expect("Failed to create classifier model");
Box::new(model)
}
Build the model:
cargo build --release -p mlp-classifier
use inferox_engine::{InferoxEngine, EngineConfig};
use inferox_candle::CandleBackend;
use libloading::{Library, Symbol};
type ModelFactory = fn() -> Box<dyn Model<Backend = CandleBackend, Input = CandleTensor, Output = CandleTensor>>;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let backend = CandleBackend::cpu();
let config = EngineConfig::default();
let mut engine = InferoxEngine::new(backend.clone(), config);
// Load model from shared library
unsafe {
let lib = Library::new("target/release/libmlp_classifier.dylib")?;
let factory: Symbol<ModelFactory> = lib.get(b"create_model")?;
let model = factory();
engine.register_boxed_model(model);
std::mem::forget(lib);
}
// Run inference
let input = backend.tensor_builder().build_from_vec(
vec![0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
&[1, 10]
)?;
let output = engine.infer("classifier", input)?;
println!("Output: {:?}", output.to_vec2::<f32>()?);
Ok(())
}
inferox/
├── crates/
│ ├── inferox-core/ # Core traits and types
│ ├── inferox-candle/ # Candle backend implementation
│ └── inferox-engine/ # Inference engine runtime
├── examples/
│ └── mlp/ # MLP example with dynamic loading
│ ├── src/
│ │ ├── lib.rs # MLP architecture
│ │ └── main.rs # Engine runtime
│ └── models/
│ ├── classifier/ # Compiled to .dylib/.so
│ └── small/ # Compiled to .dylib/.so
├── Makefile # Development commands
└── .github/workflows/ # CI/CD pipelines
Core trait definitions for backends, tensors, and models:
Backend
- Hardware abstraction (CPU, CUDA, Metal, etc.)Tensor
- N-dimensional array operationsModel
- Model trait with forward pass, metadata, and state managementDataType
- Numeric type system with safe conversions
Candle backend implementation using Hugging Face's Candle:
CandleBackend
- Backend for Candle tensorsCandleTensor
- Tensor wrapper with type-safe operationsCandleModelBuilder
- Model initialization with weight loadingCandleVarMap
- Weight management and serialization
High-level inference engine with model management:
InferoxEngine
- Multi-model inference orchestrationInferenceSession
- Stateful inference with contextEngineConfig
- Runtime configuration (batch size, device, etc.)- Dynamic model loading via trait objects
A complete example demonstrating the two-pillar architecture:
# Build model libraries
make models
# Run the engine with multiple models
cargo run --bin mlp --release -- \
target/release/libmlp_classifier.dylib \
target/release/libmlp_small.dylib
Output:
Inferox MLP Engine
==================
✓ Created CPU backend
Loading model from: target/release/libmlp_classifier.dylib
✓ Registered 'classifier' - Multi-Layer Perceptron (10 → 8 → 8 → 3)
Loading model from: target/release/libmlp_small.dylib
✓ Registered 'small' - Multi-Layer Perceptron (5 → 4 → 4 → 2)
2 models loaded
Available models:
- classifier v1.0.0: Multi-Layer Perceptron (10 → 8 → 8 → 3)
- small v1.0.0: Multi-Layer Perceptron (5 → 4 → 4 → 2)
Running test inference on all models:
classifier -> output shape: [1, 3]
small -> output shape: [1, 2]
✓ All models working!
See examples/mlp/README.md for detailed documentation.
- Rust 1.70+ (2021 edition)
- Cargo
# Build all crates
make build
# Build in release mode
make build-release
# Build model libraries
make models
# Build examples
make examples
# Run tests + quick lint (recommended)
make test
# Run tests only
make test-quick
# Run specific crate tests
make test-core
make test-candle
make test-engine
# Format code
make format
# Run clippy linter
make lint
# Quick pre-commit checks
make pre-commit
# Generate and open docs
make doc
# Generate docs including private items
make doc-private
The project uses GitHub Actions for continuous integration:
- Format Check: Ensures code follows
rustfmt
standards - Clippy Lint: Catches common mistakes and anti-patterns
- Test Suite: Runs on Ubuntu and macOS with stable Rust
- Model Libraries: Verifies model binaries build correctly
- Documentation: Ensures docs build without warnings
- Examples: Validates all examples compile and run
See .github/workflows/pr-checks.yml for the complete pipeline.
- Core trait system
- Candle backend
- Inference engine
- Dynamic model loading
- MLP example
- ResNet18 example
- ONNX backend
- Batch inference optimization
- Model quantization support
- GPU acceleration (CUDA, Metal)
- Production deployment guide
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Run
make pre-commit
before committing - Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.