Skip to content

KolosalAI/torch-inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Torch Inference - Enterprise ML Inference Server

High-performance PyTorch inference framework in Rust with production-grade testing and monitoring.

🎯 Features

  • Production-Ready Testing: 147+ unit tests, integration tests, and benchmarks
  • Enterprise Resilience: Circuit breaker, bulkhead isolation, request deduplication
  • High Performance: Multi-level caching, dynamic batching, concurrent processing
  • Comprehensive Monitoring: Real-time metrics, health checks, endpoint statistics
  • Type-Safe: Full Rust type safety with zero-cost abstractions

Quick Start

Build the Server

cargo build --release

Run Tests

# Run all tests (147+ unit tests + integration tests)
cargo test

# Run with verbose output
cargo test -- --nocapture

# Run specific test suites
cargo test cache::tests      # Cache system tests (38 tests)
cargo test batch::tests      # Batch processing tests (28 tests)
cargo test monitor::tests    # Monitoring tests (28 tests)
cargo test resilience::      # Resilience pattern tests (16 tests)

# Run integration tests only
cargo test --test integration_test

# Run benchmarks
cargo bench

Run the Server

cargo run --bin torch-inference-server

πŸ“Š Test Coverage (Enterprise-Grade)

Core Infrastructure (91 tests)

  • Cache System (38 tests)

    • Basic CRUD operations
    • TTL-based expiration
    • Concurrent access (10+ threads)
    • Unicode keys support
    • Boundary conditions
    • Memory efficiency
    • Large value handling
    • Stress testing (20 threads Γ— 100 ops)
  • Batch Processing (28 tests)

    • Dynamic batching
    • Timeout handling
    • Priority management
    • Concurrent additions
    • Large input handling
    • Stress testing (20 producers Γ— 100 items)
  • Monitoring (28 tests)

    • Request tracking
    • Latency metrics (min/max/avg)
    • Throughput calculation
    • Health status
    • Endpoint statistics
    • Concurrent recording (10 threads Γ— 100 ops)
    • High-frequency updates (10k ops/sec)

Resilience Patterns (16 tests)

  • Circuit Breaker (10 tests)

    • State transitions (Closed β†’ Open β†’ HalfOpen)
    • Failure threshold detection
    • Automatic recovery
    • Reset functionality
  • Bulkhead (6 tests)

    • Permit acquisition
    • Capacity management
    • Resource isolation
    • Concurrent operations

Additional Coverage (40+ tests)

  • Error handling and propagation
  • Configuration management
  • Request deduplication
  • API endpoints
  • Core ML components

Integration Tests (6 tests)

  • End-to-end request flow
  • Concurrent system load (100 concurrent requests)
  • Batch processing pipeline
  • Cache + Monitor integration
  • Error condition handling

πŸš€ Performance Benchmarks

Run benchmarks to measure performance:

# Run all benchmarks
cargo bench

# Run specific benchmark
cargo bench cache_get

# Generate benchmark reports (in target/criterion)
cargo bench --bench cache_bench

Benchmark categories:

  • cache_set: Insertion performance at various scales (100, 1K, 10K)
  • cache_get: Retrieval performance with populated cache
  • cache_cleanup: Expiration cleanup performance

πŸ—οΈ Architecture

Test Structure

tests/
β”œβ”€β”€ integration_test.rs     # Integration tests
└── ...

benches/
└── cache_bench.rs          # Performance benchmarks

src/
β”œβ”€β”€ cache.rs               # 38 unit tests
β”œβ”€β”€ batch.rs               # 28 unit tests
β”œβ”€β”€ monitor.rs             # 28 unit tests
β”œβ”€β”€ dedup.rs               # 9 unit tests
β”œβ”€β”€ error.rs               # 11 unit tests
β”œβ”€β”€ config.rs              # 7 unit tests
└── resilience/
    β”œβ”€β”€ circuit_breaker.rs # 10 unit tests
    └── bulkhead.rs        # 6 unit tests

Enterprise Testing Features

βœ… Concurrency Testing: All components tested with 10-50 concurrent threads βœ… Stress Testing: High-load scenarios (10K+ operations) βœ… Boundary Conditions: Edge cases, zero values, max values βœ… Performance Testing: Criterion benchmarks for critical paths βœ… Integration Testing: End-to-end workflows βœ… Error Scenarios: Failure injection and recovery βœ… Memory Safety: No unsafe code, all tests thread-safe

πŸ“ˆ Continuous Testing

# Watch mode - run tests on file change
cargo watch -x test

# Coverage report (requires cargo-tarpaulin)
cargo tarpaulin --out Html

# Run tests in parallel
cargo test -- --test-threads=8

# Run tests sequentially (for debugging)
cargo test -- --test-threads=1

πŸ”¬ Test Quality Standards

All tests follow enterprise standards:

  1. Isolation: Each test is independent and can run in any order
  2. Determinism: Tests produce consistent results
  3. Performance: Fast execution (<30s for full suite)
  4. Readability: Clear test names and assertions
  5. Coverage: Critical paths have multiple test scenarios
  6. Documentation: Comments explain complex test logic

πŸ› οΈ Development

Features

Optional Backend Support

# Enable PyTorch backend
cargo build --features torch

# Enable ONNX backend (requires ONNX Runtime)
cargo build --features onnx

# Enable Candle backend
cargo build --features candle

# Enable all backends
cargo build --features all-backends

CUDA Support

cargo build --features cuda

Project Structure

src/
β”œβ”€β”€ lib.rs              # Library exports for testing
β”œβ”€β”€ main.rs             # Server entry point
β”œβ”€β”€ api/                # REST API endpoints
β”œβ”€β”€ auth/               # Authentication
β”œβ”€β”€ batch.rs            # Batch processing
β”œβ”€β”€ cache.rs            # Caching system
β”œβ”€β”€ config.rs           # Configuration
β”œβ”€β”€ core/               # ML inference engines
β”œβ”€β”€ dedup.rs            # Request deduplication
β”œβ”€β”€ error.rs            # Error handling
β”œβ”€β”€ middleware/         # HTTP middleware
β”œβ”€β”€ models/             # Model management
β”œβ”€β”€ monitor.rs          # Monitoring & metrics
β”œβ”€β”€ resilience/         # Resilience patterns
β”œβ”€β”€ security/           # Security features
└── telemetry/          # Logging & tracing

Running Specific Tests

# Test caching system
cargo test cache::tests

# Test batch processing
cargo test batch::tests

# Test circuit breaker
cargo test circuit_breaker::tests

# Test monitoring
cargo test monitor::tests

# Run with verbose output
cargo test -- --nocapture --test-threads=1

Development

Code Style

All tests follow Rust best practices:

  • Tests are co-located with implementation using #[cfg(test)]
  • Async tests use #[tokio::test]
  • Tests are isolated and can run in parallel
  • No external dependencies for core tests

Adding New Tests

Add test modules at the bottom of implementation files:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_my_feature() {
        // Test code here
    }

    #[tokio::test]
    async fn test_async_feature() {
        // Async test code here
    }
}

Performance

The server includes several performance optimizations:

  • Request batching for improved throughput
  • Multi-level caching (in-memory + request deduplication)
  • Circuit breaker pattern for fault tolerance
  • Bulkhead pattern for resource isolation
  • Comprehensive monitoring and metrics

License

Copyright Β© 2024 Genta Dev Team

Testing

Comprehensive testing has been completed for all endpoints and features.

Quick Test

./test_quick.sh

Full Test Suite

./test_final_report.sh

Test Results

See docs/TEST_RESULTS.md for detailed test results and coverage.

Latest Test Results:

  • βœ… 47/47 tests passed (100% success rate)
  • βœ… All 6 TTS engines operational
  • βœ… All 22 SOTA models available for download
  • βœ… Stress tested with 20+ concurrent requests
  • βœ… System monitoring and performance metrics verified

πŸ“š Documentation

Complete documentation is available in the docs/ directory:

Getting Started

SOTA Models

Benchmarking

Testing & Fixes

About

Torch model inference serving

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors