GitHub - datamata-io/mata-server: MATA Server is a lightweight inference runtime designed to serve computer vision models built on the MATA (Model-Agnostic Task Architecture)

MATA-SERVER

MATA-SERVER is the production inference runtime server for the MATA ecosystem. It wraps MATA's unified model adapter behind a REST + WebSocket API, providing on-demand model loading, memory-aware eviction, and real-time streaming inference — all without coupling your application to a specific ML runtime.

Features

Unified inference API — single endpoint for detection, segmentation, classification, pose estimation, OCR, depth, and VLM tasks
REST + WebSocket — POST /v1/infer for single-shot requests; WS /v1/stream/{session_id} for real-time frame streaming
On-demand model loading — models are loaded on first request and evicted under memory pressure (LRU policy)
Memory-aware eviction — configurable VRAM and RAM utilization ceilings with keep-alive protection for active models
Multi-source model pulls — fetch models from HuggingFace Hub, arbitrary URLs, or local directories via POST /v1/models/pull
mata.v1 response schema — consistent, versioned JSON output across all task types
GPU + CPU support — CUDA GPU inference via NVIDIA runtime; automatic CPU fallback
API key authentication — bearer token auth with configurable key list (or disabled for development)
OpenAPI docs — auto-generated Swagger UI at /docs and ReDoc at /redoc
Docker-ready — multi-stage CUDA image; GPU passthrough with NVIDIA Container Toolkit

Quick Start

Local install

Requirements: Python 3.10+, pip

# 1. Clone the repository
git clone https://github.com/datamata-io/mata-server.git
cd mata-server

# 2. Install (CPU-only; add [onnx] or [torch] for GPU backends)
pip install -e .

# 3. Configure
cp .env.example .env
# Edit .env — set MATA_SERVER_API_KEYS, or set MATA_SERVER_AUTH_MODE=none for local dev

# 4. Start the server
mataserver serve

The server starts on http://0.0.0.0:8110. Visit http://localhost:8110/docs for the interactive API explorer.

Docker

# Pull the pre-built image
docker pull ghcr.io/datamata-io/mataserver:latest

# Run (CPU-only)
docker run -p 8110:8110 \
  -e MATA_SERVER_AUTH_MODE=none \
  -v mataserver-data:/var/lib/mataserver \
  ghcr.io/datamata-io/mataserver:latest

# Run with GPU (requires NVIDIA Container Toolkit)
docker run --gpus all -p 8110:8110 \
  -e MATA_SERVER_AUTH_MODE=none \
  -v mataserver-data:/var/lib/mataserver \
  ghcr.io/datamata-io/mataserver:latest

Docker Compose

cp .env.example .env
# Edit .env with your settings
docker compose up -d

Verify the server is running:

curl http://localhost:8110/v1/health
# {"status":"ok","version":"0.1.0","gpu_available":false}

CLI Commands

The mataserver console script provides commands for server management and model operations.

Command	Description
`mataserver serve`	Start the inference server
`mataserver pull <m> --task T`	Download/install and register a model (HuggingFace or pip backend)
`mataserver list`	List all registered models (alias: `ls`)
`mataserver show <m>`	Show detailed info for a model
`mataserver rm <m>`	Remove a model from the registry
`mataserver load <m>`	Preload a model into memory (alias: `warmup`)
`mataserver stop <m>`	Unload a model from memory
`mataserver version`	Print version (also: `mataserver -v`)

For full usage details, argument references, and examples, see docs/api.md.

Configuration

All settings use the MATA_SERVER_ environment variable prefix and can also be set in an .env file or a YAML config specified by MATA_SERVER_CONFIG_FILE.

Variable	Default	Description
`MATA_SERVER_HOST`	`0.0.0.0`	Bind address
`MATA_SERVER_PORT`	`8110`	Bind port
`MATA_SERVER_LOG_LEVEL`	`info`	Logging level (`debug`, `info`, `warning`, `error`)
`MATA_SERVER_AUTH_MODE`	`api_key`	Auth mode: `api_key` (enforce bearer tokens) or `none` (open, dev only)
`MATA_SERVER_API_KEYS`	(empty)	Comma-separated list of valid API keys (required when `auth_mode=api_key`)
`MATA_SERVER_KEEP_ALIVE`	`600`	Seconds a loaded model stays in memory after its last request
`MATA_SERVER_MAX_VRAM_UTIL`	`0.85`	Fraction of GPU VRAM that triggers model eviction (`0.0`–`1.0`)
`MATA_SERVER_MAX_RAM_UTIL`	`0.80`	Fraction of system RAM that triggers model eviction (`0.0`–`1.0`)
`MATA_SERVER_EVICTION_POLICY`	`lru`	Model eviction policy — currently only `lru` (least-recently-used)
`MATA_SERVER_DATA_DIR`	`/var/lib/mataserver`	Root directory for models, cache, and blob storage
`MATA_SERVER_CONFIG_FILE`	(unset)	Optional path to a YAML config file (env vars always take priority)

Priority (highest → lowest): environment variables → YAML config file → .env file → built-in defaults.

See .env.example for a fully annotated template with production-recommended values.

API Endpoints

Interactive docs are served at http://localhost:8110/docs (Swagger UI) and http://localhost:8110/redoc.

Method	Path	Auth	Description
`GET`	`/v1/health`	No	Server health check — always returns `200 OK`
`GET`	`/v1/models`	Yes	List all registered models
`GET`	`/v1/models/{model_id}`	Yes	Get details and load state for a single model
`POST`	`/v1/models/pull`	Yes	Pull a model from HuggingFace, URL, or local path
`POST`	`/v1/models/warmup`	Yes	Pre-load a model into memory
`POST`	`/v1/infer`	Yes	Single-shot inference (JSON body, base64 image)
`POST`	`/v1/infer/upload`	Yes	Single-shot inference (multipart file upload)
`POST`	`/v1/sessions`	Yes	Create a WebSocket streaming session
`DELETE`	`/v1/sessions/{session_id}`	Yes	Close and clean up a streaming session
`WebSocket`	`/v1/stream/{session_id}`	`?token`	Real-time binary frame → JSON result streaming

REST endpoints authenticate via Authorization: Bearer <key>. The WebSocket endpoint uses ?token=<key> as a query parameter.

For full request/response schemas, per-endpoint error codes, and additional curl examples, see docs/api.md.

Example Usage

Health check

curl http://localhost:8110/v1/health

{ "status": "ok", "version": "0.1.0", "gpu_available": false }

Pull a model

# HuggingFace model
curl -X POST http://localhost:8110/v1/models/pull \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "datamata/rtdetr-l", "task": "detect"}'

{ "status": "pulled", "model": "datamata/rtdetr-l" }

# Pip-based OCR backend
curl -X POST http://localhost:8110/v1/models/pull \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "easyocr", "task": "ocr"}'

Or via the CLI:

# HuggingFace Task Detection model (Example: RT-DETR ResNet-18 backbone)
mataserver pull PekingU/rtdetr_r18vd --task detect

# HuggingFace Task Classification model (Example: ResNet-50)
mataserver pull microsoft/resnet-50 --task classify

# HuggingFace Task Segmentation model (Example: Mask2Former Swin-Tiny trained on COCO)
mataserver pull facebook/mask2former-swin-tiny-coco-instance --task segment

# HuggingFace Task Depth model (Example: Depth Anything V2 Small)
mataserver pull depth-anything/Depth-Anything-V2-Small-hf --task depth

# HuggingFace Task Visual Language Model (VLM)
mataserver pull Qwen/Qwen3-VL-2B-Instruct --task vlm

# HuggingFace OCR model
mataserver pull stepfun-ai/GOT-OCR-2.0-hf --task ocr

# Pip-installed OCR backends
mataserver pull easyocr --task ocr
mataserver pull paddleocr --task ocr
mataserver pull tesseract --task ocr  # requires tesseract system binary

Single-shot inference (base64 JSON)

IMAGE_B64=$(base64 -w0 /path/to/image.jpg)

curl -X POST http://localhost:8110/v1/infer \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d "{\"model\": \"datamata/rtdetr-l\", \"image\": \"${IMAGE_B64}\", \"confidence\": 0.5}"

{
  "schema_version": "mata.v1",
  "task": "detect",
  "model": "datamata/rtdetr-l",
  "timestamp": "2026-03-04T10:00:00Z",
  "detections": [
    { "label": "person", "confidence": 0.92, "bbox": [120, 45, 300, 480] }
  ]
}

Single-shot inference (file upload)

curl -X POST http://localhost:8110/v1/infer/upload \
  -H "Authorization: Bearer your-api-key" \
  -F "model=datamata/rtdetr-l" \
  -F "confidence=0.5" \
  -F "file=@/path/to/image.jpg"

Create a streaming session and connect via WebSocket

# 1. Create the session
SESSION=$(curl -s -X POST http://localhost:8110/v1/sessions \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "datamata/rtdetr-l", "task": "detect"}' \
  | python3 -c "import sys, json; print(json.load(sys.stdin)['session_id'])")

echo "Session ID: $SESSION"

# 2. Connect via WebSocket and stream frames
import asyncio, struct, time, websockets

SESSION_ID = "sess_xxxxxxxxxxxx"  # replace with session_id from above
API_KEY    = "your-api-key"

async def stream():
    uri = f"ws://localhost:8110/v1/stream/{SESSION_ID}?token={API_KEY}"
    async with websockets.connect(uri) as ws:
        with open("/path/to/image.jpg", "rb") as f:
            image = f.read()
        # 13-byte header: frame_id (uint32 BE) + timestamp (float64 BE) + encoding (uint8, 0=JPEG)
        header = struct.pack(">IdB", 1, time.time(), 0)
        await ws.send(header + image)
        result = await ws.recv()
        print(result)

asyncio.run(stream())

See docs/streaming.md for the full binary frame protocol specification and a complete async client example.

Runnable example scripts

The examples/ directory contains ready-to-run Python clients:

Script	Description
`examples/rest_infer.py`	REST inference — detect, classify, segment
`examples/rest_vlm.py`	REST inference — visual language model (VLM)
`examples/ws_video_infer.py`	WebSocket video streaming — frame-by-frame results

See examples/README.md for full usage, argument reference, and sample output for each script.

Development Setup

# 1. Clone
git clone https://github.com/datamata-io/mata-server.git
cd mata-server

# 2. Create and activate a virtual environment
python -m venv venv
source venv/bin/activate       # Linux / macOS
venv\Scripts\activate          # Windows

# 3. Install with all extras and dev tools
pip install -e ".[all,dev]"

# 4. Run the full test suite
pytest

# 5. Run with coverage report
pytest --cov=mataserver --cov-report=term-missing

# 6. Lint and format
ruff check mataserver/ tests/
ruff format mataserver/ tests/

# 7. Start the server in development mode (auth disabled, auto-reload)
MATA_SERVER_AUTH_MODE=none uvicorn mataserver.main:create_app --factory --reload

Project structure

mataserver/
├── api/          # FastAPI routers and middleware
│   └── v1/       # Versioned endpoints: health, models, infer, sessions, stream
├── core/         # Lifecycle state machine, model cache, memory manager, runtime manager
├── engines/      # Engine base class (MATA handles runtime internals)
├── models/       # Pack manifest parsing, registry, pull system
├── schemas/      # Pydantic request / response models
├── streaming/    # WebSocket binary protocol, session dataclass, session manager
└── utils/        # GPU utilities, structured logging helpers

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository and create a feature branch:
```
git checkout -b feature/my-feature
```
Implement your change, following the coding standards:
- Python 3.10+ syntax; type annotations on all public functions
- Docstrings on all public classes and functions
- Line length ≤ 100 characters (ruff enforced)

Test your change — all tests must pass and coverage must stay above 85%:

pytest --cov=mataserver
ruff check mataserver/ tests/
ruff format --check mataserver/ tests/

Commit using conventional commit messages:

feat(api): add streaming frame drop policy
fix(memory): prevent eviction of active models
docs(readme): update configuration table

Open a Pull Request against main. Describe what the PR changes and reference any related issue.

CI runs lint and tests on Python 3.10, 3.11, and 3.12. PRs that fail CI will not be merged.

Bug reports and feature requests are welcome via GitHub Issues.

License

MIT — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
configs		configs
docs		docs
examples		examples
mataserver		mataserver
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MATA-SERVER

Features

Quick Start

Local install

Docker

Docker Compose

CLI Commands

Configuration

API Endpoints

Example Usage

Health check

Pull a model

Single-shot inference (base64 JSON)

Single-shot inference (file upload)

Create a streaming session and connect via WebSocket

Runnable example scripts

Development Setup

Project structure

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MATA-SERVER

Features

Quick Start

Local install

Docker

Docker Compose

CLI Commands

Configuration

API Endpoints

Example Usage

Health check

Pull a model

Single-shot inference (base64 JSON)

Single-shot inference (file upload)

Create a streaming session and connect via WebSocket

Runnable example scripts

Development Setup

Project structure

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages