Problem
Sheaf currently assumes Ray Serve as the execution substrate. Teams running on vanilla Kubernetes, SageMaker, Cloud Run, or Lambda have no first-class path.
Deployment tiers
Tier 1 — Plain ASGI (no Ray)
_build_asgi_app in modal_server.py builds a plain FastAPI app with no Ray dependency. Full v0.5 feature set: typed contracts, caching, streaming SSE, Feast, logging, Prometheus metrics, OTel traces.
Best for: serverless (Lambda, Cloud Run, Fly.io), CPU-only workloads where Ray process overhead isn't justified.
# server.py
from sheaf.modal_server import _build_asgi_app
from sheaf.spec import ModelSpec
from sheaf.api.base import ModelType
app = _build_asgi_app([
ModelSpec(name="chronos", model_type=ModelType.TIME_SERIES, backend="chronos2",
backend_kwargs={"model_id": "amazon/chronos-bolt-small"}),
])
# uvicorn server:app --host 0.0.0.0 --port 8000
Tier 2 — Single-node Ray Serve in a k8s pod
ray start --head + serve run in the container. Full Sheaf feature set including @serve.batch, batching policy, and hot-swap. k8s HPA handles pod scaling. No KubeRay needed.
Best for: most teams already on k8s. Right default for production Kubernetes deployments.
Tier 3 — KubeRay
Ray manages the cluster itself across pods. Only necessary for multi-node distributed Ray execution. Non-trivial platform ask — most teams don't need it.
SageMaker Inference (packaging only, not a feature)
The SageMaker container contract is:
POST /invocations → /predict
GET /ping → /health
- Port 8080
That's a deploy/sagemaker/Dockerfile + a serve entrypoint script. No Sheaf code changes. Pure packaging — the ASGI app already does everything SageMaker needs.
SageMaker Batch Transform (v0.6, real integration)
Reads from S3, writes results to S3. Maps directly onto the planned v0.6 BatchRunner. This is where the actual integration work lives — not in the inference container.
Proposed additions
Longer term (v0.6+)
- SageMaker Batch Transform integration aligned with
BatchRunner
KubernetesServer class generating manifests from ModelSpec lists
- KEDA autoscaling on queue depth for
SheafWorker
Problem
Sheaf currently assumes Ray Serve as the execution substrate. Teams running on vanilla Kubernetes, SageMaker, Cloud Run, or Lambda have no first-class path.
Deployment tiers
Tier 1 — Plain ASGI (no Ray)
_build_asgi_appinmodal_server.pybuilds a plain FastAPI app with no Ray dependency. Full v0.5 feature set: typed contracts, caching, streaming SSE, Feast, logging, Prometheus metrics, OTel traces.Best for: serverless (Lambda, Cloud Run, Fly.io), CPU-only workloads where Ray process overhead isn't justified.
Tier 2 — Single-node Ray Serve in a k8s pod
ray start --head+serve runin the container. Full Sheaf feature set including@serve.batch, batching policy, and hot-swap. k8s HPA handles pod scaling. No KubeRay needed.Best for: most teams already on k8s. Right default for production Kubernetes deployments.
Tier 3 — KubeRay
Ray manages the cluster itself across pods. Only necessary for multi-node distributed Ray execution. Non-trivial platform ask — most teams don't need it.
SageMaker Inference (packaging only, not a feature)
The SageMaker container contract is:
POST /invocations→/predictGET /ping→/healthThat's a
deploy/sagemaker/Dockerfile+ aserveentrypoint script. No Sheaf code changes. Pure packaging — the ASGI app already does everything SageMaker needs.SageMaker Batch Transform (v0.6, real integration)
Reads from S3, writes results to S3. Maps directly onto the planned v0.6
BatchRunner. This is where the actual integration work lives — not in the inference container.Proposed additions
Dockerfile— multi-stage, installssheaf-serve[...]extras, runsuvicornon 8000docker-compose.yml— local dev / smoke testdeploy/kubernetes/—Deployment,Service,HPAmanifestsdeploy/sagemaker/—Dockerfile(port 8080,/invocations+/pingaliases),serveentrypoint scriptexamples/quickstart_kubernetes.py— tier 2 path explicitly documentedLonger term (v0.6+)
BatchRunnerKubernetesServerclass generating manifests fromModelSpeclistsSheafWorker