Skip to content

Kubernetes-native deployment path #11

@korbonits

Description

@korbonits

Problem

Sheaf currently assumes Ray Serve as the execution substrate. Teams running on vanilla Kubernetes, SageMaker, Cloud Run, or Lambda have no first-class path.

Deployment tiers

Tier 1 — Plain ASGI (no Ray)
_build_asgi_app in modal_server.py builds a plain FastAPI app with no Ray dependency. Full v0.5 feature set: typed contracts, caching, streaming SSE, Feast, logging, Prometheus metrics, OTel traces.

Best for: serverless (Lambda, Cloud Run, Fly.io), CPU-only workloads where Ray process overhead isn't justified.

# server.py
from sheaf.modal_server import _build_asgi_app
from sheaf.spec import ModelSpec
from sheaf.api.base import ModelType

app = _build_asgi_app([
    ModelSpec(name="chronos", model_type=ModelType.TIME_SERIES, backend="chronos2",
              backend_kwargs={"model_id": "amazon/chronos-bolt-small"}),
])
# uvicorn server:app --host 0.0.0.0 --port 8000

Tier 2 — Single-node Ray Serve in a k8s pod
ray start --head + serve run in the container. Full Sheaf feature set including @serve.batch, batching policy, and hot-swap. k8s HPA handles pod scaling. No KubeRay needed.

Best for: most teams already on k8s. Right default for production Kubernetes deployments.

Tier 3 — KubeRay
Ray manages the cluster itself across pods. Only necessary for multi-node distributed Ray execution. Non-trivial platform ask — most teams don't need it.

SageMaker Inference (packaging only, not a feature)

The SageMaker container contract is:

  • POST /invocations/predict
  • GET /ping/health
  • Port 8080

That's a deploy/sagemaker/Dockerfile + a serve entrypoint script. No Sheaf code changes. Pure packaging — the ASGI app already does everything SageMaker needs.

SageMaker Batch Transform (v0.6, real integration)

Reads from S3, writes results to S3. Maps directly onto the planned v0.6 BatchRunner. This is where the actual integration work lives — not in the inference container.

Proposed additions

  • Dockerfile — multi-stage, installs sheaf-serve[...] extras, runs uvicorn on 8000
  • docker-compose.yml — local dev / smoke test
  • deploy/kubernetes/Deployment, Service, HPA manifests
  • deploy/sagemaker/Dockerfile (port 8080, /invocations + /ping aliases), serve entrypoint script
  • examples/quickstart_kubernetes.py — tier 2 path explicitly documented
  • README clarifying all three tiers + SageMaker

Longer term (v0.6+)

  • SageMaker Batch Transform integration aligned with BatchRunner
  • KubernetesServer class generating manifests from ModelSpec lists
  • KEDA autoscaling on queue depth for SheafWorker

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions