Ratatoskr

A unified LLM gateway abstraction layer for Rust.

Ratatoskr provides a stable, provider-agnostic interface for interacting with language models. Named after the Norse squirrel who carries messages between realms in Yggdrasil, it routes your requests to the right provider while keeping your code decoupled from implementation details.

Features

Single trait interface — ModelGateway abstracts all providers
Multi-provider support — OpenRouter, Anthropic, OpenAI, Google, Ollama
Streaming & non-streaming — Both chat interfaces supported
Tool calling — Full function/tool support with JSON schema parameters
Extended thinking — Reasoning config for models that support it
Text generation — Simple prompt-in, text-out interface
Embeddings — Local via fastembed-rs or remote via HuggingFace API
NLI — Natural language inference for semantic analysis
Stance detection — Classify text as favor/against/neutral toward a target
Token counting — HuggingFace tokenizers with model-appropriate defaults
Fallback chains — Automatic local→remote fallback when resources constrained
Service mode — ratd daemon + rat CLI over gRPC, share a gateway across processes
Type-safe — Strong Rust types for messages, options, and responses

Quick Start

use ratatoskr::{Ratatoskr, ChatOptions, Message, ModelGateway};

#[tokio::main]
async fn main() -> ratatoskr::Result<()> {
    // Build a gateway with your providers
    let gateway = Ratatoskr::builder()
        .openrouter(std::env::var("OPENROUTER_API_KEY")?)
        .build()?;

    // Create a conversation
    let messages = vec![
        Message::system("You are a helpful assistant."),
        Message::user("What is the capital of France?"),
    ];

    // Configure the request
    let options = ChatOptions::default()
        .model("anthropic/claude-sonnet-4")
        .temperature(0.7)
        .max_tokens(1000);

    // Get a response
    let response = gateway.chat(&messages, None, &options).await?;
    println!("{}", response.content);

    Ok(())
}

Installation

Add to your Cargo.toml:

[dependencies]
ratatoskr = "0.1"

Feature Flags

Enable only the providers you need:

[dependencies]
ratatoskr = { version = "0.1", default-features = false, features = ["openrouter", "anthropic"] }

# For HuggingFace API (embeddings, NLI, classification)
ratatoskr = { version = "0.1", features = ["huggingface"] }

# For local inference (no API keys needed)
ratatoskr = { version = "0.1", features = ["local-inference"] }

# GPU support for ONNX
ratatoskr = { version = "0.1", features = ["local-inference", "cuda"] }

Available features:

Chat providers: openai, anthropic, openrouter, ollama, google (default)
API inference: huggingface
Local inference: local-inference, cuda
Service mode: server, client

Provider Configuration

let gateway = Ratatoskr::builder()
    .openrouter("sk-or-...")           // OpenRouter API key
    .anthropic("sk-ant-...")           // Direct Anthropic API
    .openai("sk-...")                  // Direct OpenAI API
    .google("...")                     // Google/Gemini API
    .ollama("http://localhost:11434")  // Local Ollama instance
    .huggingface("hf_...")             // HuggingFace Inference API
    .timeout(120)                      // Request timeout in seconds
    .build()?;

At least one provider must be configured.

Local Inference Configuration

use ratatoskr::{Device, LocalEmbeddingModel, LocalNliModel};

let gateway = Ratatoskr::builder()
    .openrouter("sk-or-...")                           // Still need chat provider
    .local_embeddings(LocalEmbeddingModel::AllMiniLmL6V2)
    .local_nli(LocalNliModel::NliDebertaV3Small)
    .device(Device::Cpu)                               // or Device::Cuda { device_id: 0 }
    .cache_dir("/custom/model/cache")                  // Optional
    .ram_budget(1024 * 1024 * 1024)                    // Optional: 1GB limit for local models
    .build()?;

When RAM budget is set, local providers automatically fall back to API providers when memory is constrained.

Model Routing

Models are automatically routed based on their name:

Pattern	Provider
`anthropic/claude-*`	OpenRouter
`openai/gpt-*`	OpenRouter
`claude-*`	Direct Anthropic
`gpt-`, `o1-`, `o3-*`	Direct OpenAI
`gemini-*`	Google
`model:tag`	Ollama (local)
Other	OpenRouter (default)

Streaming

use futures_util::StreamExt;
use ratatoskr::ChatEvent;

let mut stream = gateway.chat_stream(&messages, None, &options).await?;

while let Some(event) = stream.next().await {
    match event? {
        ChatEvent::Content(text) => print!("{}", text),
        ChatEvent::Reasoning(thought) => eprintln!("[thinking] {}", thought),
        ChatEvent::ToolCallStart { name, .. } => println!("Calling tool: {}", name),
        ChatEvent::Usage(usage) => println!("Tokens: {}", usage.total_tokens),
        ChatEvent::Done => break,
        _ => {}
    }
}

Tool Calling

use ratatoskr::{ToolDefinition, ToolChoice};
use serde_json::json;

// Define a tool
let weather_tool = ToolDefinition::new(
    "get_weather",
    "Get the current weather for a location",
    json!({
        "type": "object",
        "properties": {
            "location": { "type": "string", "description": "City name" }
        },
        "required": ["location"]
    }),
);

let options = ChatOptions::default()
    .model("anthropic/claude-sonnet-4")
    .tool_choice(ToolChoice::Auto);

let response = gateway.chat(&messages, Some(&[weather_tool]), &options).await?;

// Handle tool calls
for tool_call in &response.tool_calls {
    println!("Tool: {} with args: {}", tool_call.name, tool_call.arguments);

    // Parse arguments into your struct
    let args: WeatherArgs = tool_call.parse_arguments()?;
}

Extended Thinking

For models that support reasoning (Claude with extended thinking, o1, etc.):

use ratatoskr::{ReasoningConfig, ReasoningEffort};

let options = ChatOptions::default()
    .model("anthropic/claude-sonnet-4")
    .reasoning(ReasoningConfig {
        effort: Some(ReasoningEffort::High),
        max_tokens: Some(10000),
        exclude_from_output: Some(false),  // Include thinking in response
    });

let response = gateway.chat(&messages, None, &options).await?;
if let Some(reasoning) = &response.reasoning {
    println!("Thinking: {}", reasoning);
}

Error Handling

use ratatoskr::RatatoskrError;

match gateway.chat(&messages, None, &options).await {
    Ok(response) => println!("{}", response.content),
    Err(RatatoskrError::RateLimited { retry_after }) => {
        println!("Rate limited, retry after {:?}", retry_after);
    }
    Err(RatatoskrError::AuthenticationFailed) => {
        eprintln!("Invalid API key");
    }
    Err(RatatoskrError::ModelNotFound(model)) => {
        eprintln!("Unknown model: {}", model);
    }
    Err(e) => eprintln!("Error: {}", e),
}

Architecture

Your Application
       │
       ▼
ModelGateway trait      ← stable public API
       │
       ├─► EmbeddedGateway  ← in-process, delegates to ProviderRegistry
       │
       └─► ServiceClient    ← connects to ratd over gRPC
              │
              ▼
           ratd (daemon)
              │
              ▼
           EmbeddedGateway  ← delegates to ProviderRegistry
       │
       ▼
ProviderRegistry    ← fallback chains per capability
       │
       ├─► Embedding Providers
       │     ├── LocalEmbeddingProvider (fastembed) [priority 0]
       │     └── HuggingFaceClient [priority 1, fallback]
       │
       ├─► NLI Providers
       │     ├── LocalNliProvider (ONNX) [priority 0]
       │     └── HuggingFaceClient [priority 1, fallback]
       │
       ├─► Stance Providers
       │     └── ZeroShotStanceProvider (wraps ClassifyProvider)
       │
       └─► Chat/Generate Providers
             ├── OpenRouter
             ├── Anthropic
             ├── OpenAI
             ├── Google
             └── Ollama

The ModelGateway trait is the stability boundary. Your code depends only on this trait, insulating you from provider changes.

Fallback behaviour: When a local provider returns ModelNotAvailable (wrong model or RAM budget exceeded), the registry automatically tries the next provider in the chain.

Service Mode

Service mode lets multiple processes share a single gateway instance over gRPC. The daemon (ratd) wraps an EmbeddedGateway behind gRPC handlers; clients connect via ServiceClient which implements ModelGateway transparently.

use ratatoskr::{ServiceClient, ModelGateway, ChatOptions, Message};

let client = ServiceClient::connect("http://127.0.0.1:9741").await?;
let response = client
    .chat(&[Message::user("hello!")], None,
          &ChatOptions::default().model("anthropic/claude-sonnet-4"))
    .await?;

rat health              # check connectivity
rat models              # list available models
rat chat "hello!" -m anthropic/claude-sonnet-4

See docs/service-mode.md for configuration, secrets, systemd setup, and full CLI reference.

Text Generation

For simple prompt-to-text generation (without the chat/message structure):

use ratatoskr::GenerateOptions;

let response = gateway.generate(
    "Once upon a time",
    &GenerateOptions::new("llama3.2:1b")
        .max_tokens(100)
        .temperature(0.7),
).await?;

println!("{}", response.text);

Streaming generation:

use ratatoskr::GenerateEvent;
use futures_util::StreamExt;

let mut stream = gateway.generate_stream(
    "Once upon a time",
    &GenerateOptions::new("anthropic/claude-3-haiku").max_tokens(100),
).await?;

while let Some(event) = stream.next().await {
    match event? {
        GenerateEvent::Text(text) => print!("{}", text),
        GenerateEvent::Done => break,
    }
}

Local Inference

With the local-inference feature, run embeddings and NLI locally without API keys:

use ratatoskr::providers::{FastEmbedProvider, LocalEmbeddingModel};

// Local embeddings
let mut provider = FastEmbedProvider::new(LocalEmbeddingModel::AllMiniLmL6V2)?;
let embedding = provider.embed("Hello, world!")?;
println!("Dimensions: {}", embedding.dimensions);  // 384

// Batch embeddings
let embeddings = provider.embed_batch(&["First", "Second", "Third"])?;

use ratatoskr::providers::{OnnxNliProvider, LocalNliModel};
use ratatoskr::Device;

// Local NLI
let mut provider = OnnxNliProvider::new(LocalNliModel::NliDebertaV3Small, Device::Cpu)?;
let result = provider.infer_nli("A cat is sleeping", "An animal is resting")?;
println!("{:?}: {:.2}", result.label, result.entailment);  // Entailment: 0.95

Stance Detection

Classify text as expressing favor, against, or neutral toward a target topic:

use ratatoskr::ModelGateway;

let stance = gateway.classify_stance(
    "I strongly support renewable energy initiatives.",
    "renewable energy",
    "facebook/bart-large-mnli",
).await?;

println!("{:?}: favor={:.2}, against={:.2}",
    stance.label, stance.favor, stance.against);
// Favor: favor=0.85, against=0.05

Token Counting

use ratatoskr::tokenizer::TokenizerRegistry;

let registry = TokenizerRegistry::new();
let count = registry.count_tokens("Hello, world!", "claude-sonnet-4")?;
println!("Tokens: {}", count);  // ~3

// Detailed tokenization with offsets
let tokens = gateway.tokenize("Hello, world!", "claude-sonnet-4")?;
for token in tokens {
    println!("{}: bytes {}..{}", token.text, token.start, token.end);
}

Supported embedding models: AllMiniLmL6V2, AllMiniLmL12V2, BgeSmallEn, BgeBaseEn

Supported NLI models: NliDebertaV3Base, NliDebertaV3Small, or custom ONNX models

Roadmap

Phase 1: Chat completions via OpenRouter and direct providers ✓
Phase 2: HuggingFace provider (embeddings, NLI, classification) ✓
Phase 3-4: Local inference (embeddings, NLI, tokenizers, generate) ✓
Provider Trait Refactor: Fallback chains, RAM budget, stance detection ✓
Phase 5: Service mode (gRPC daemon + CLI client) ✓
Phase 6: Caching, metrics, decorator patterns

Development

just pre-push    # Format, lint, and test (required before pushing)
just lint        # cargo fmt + clippy
just test        # Run all tests

License

ISC

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
.github/workflows		.github/workflows
contrib/systemd		contrib/systemd
docs		docs
proto		proto
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
build.rs		build.rs
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ratatoskr

Features

Quick Start

Installation

Feature Flags

Provider Configuration

Local Inference Configuration

Model Routing

Streaming

Tool Calling

Extended Thinking

Error Handling

Architecture

Service Mode

Text Generation

Local Inference

Stance Detection

Token Counting

Roadmap

Development

License

About

Uh oh!

Releases 2

Packages

Languages

emesal/ratatoskr

Folders and files

Latest commit

History

Repository files navigation

Ratatoskr

Features

Quick Start

Installation

Feature Flags

Provider Configuration

Local Inference Configuration

Model Routing

Streaming

Tool Calling

Extended Thinking

Error Handling

Architecture

Service Mode

Text Generation

Local Inference

Stance Detection

Token Counting

Roadmap

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages