API Reference

Shimmy provides multiple API interfaces for local LLM inference.

HTTP REST API

Generate Text

Endpoint: POST /api/generate

Request Body:

{
  "model": "string",           // Model name (required)
  "prompt": "string",          // Input prompt (required)
  "max_tokens": 100,          // Maximum tokens to generate (optional, default: 100)
  "temperature": 0.7,         // Sampling temperature (optional, default: 0.7)
  "stream": false             // Enable streaming response (optional, default: false)
}

Non-Streaming Response:

{
  "choices": [
    {
      "text": "Generated text response",
      "index": 0,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  }
}

Streaming Response: Server-Sent Events with data chunks:

data: {"choices":[{"text":"Hello","index":0}]}

data: {"choices":[{"text":" world","index":0}]}

data: [DONE]

List Models

Endpoint: GET /api/models

Response:

{
  "models": [
    {
      "id": "default",
      "name": "Default Model",
      "description": "Base GGUF model"
    }
  ]
}

Health Check

Endpoint: GET /api/health

Response:

{
  "status": "healthy",
  "models_loaded": 1,
  "memory_usage": "2.1GB"
}

WebSocket API

Endpoint: ws://localhost:11435/ws/generate

Connect and Send

{
  "model": "default",
  "prompt": "Hello world",
  "max_tokens": 50,
  "temperature": 0.7
}

Receive Tokens

{"token": "Hello"}
{"token": " world"}
{"done": true}

CLI Interface

Commands

# Start server
shimmy serve --bind 127.0.0.1:11435 --port 11435

# Generate text
shimmy generate --prompt "Hello" --max-tokens 50 --temperature 0.7

# List available models
shimmy list

# Probe model loading
shimmy probe [model-name]

# Show diagnostics
shimmy diag

Global Options

--verbose, -v: Enable verbose logging
--help, -h: Show help information
--version, -V: Show version information

Error Responses

All endpoints return consistent error formats:

{
  "error": {
    "code": "model_not_found",
    "message": "The specified model was not found",
    "details": "Model 'invalid-model' is not available"
  }
}

Common error codes:

model_not_found: Requested model is not available
invalid_request: Request format is invalid
generation_failed: Text generation failed
server_error: Internal server error

Rate Limiting

Currently no rate limiting is implemented. For production use, consider placing shimmy behind a reverse proxy with rate limiting capabilities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Reference

HTTP REST API

Generate Text

List Models

Health Check

WebSocket API

Connect and Send

Receive Tokens

CLI Interface

Commands

Global Options

Error Responses

Rate Limiting

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

API Reference

HTTP REST API

Generate Text

List Models

Health Check

WebSocket API

Connect and Send

Receive Tokens

CLI Interface

Commands

Global Options

Error Responses

Rate Limiting