Kubernetes-native deployment path

## Problem

Sheaf currently assumes Ray Serve as the execution substrate. Teams running on vanilla Kubernetes, SageMaker, Cloud Run, or Lambda have no first-class path.

## Deployment tiers

**Tier 1 — Plain ASGI (no Ray)**
`_build_asgi_app` in `modal_server.py` builds a plain FastAPI app with no Ray dependency. Full v0.5 feature set: typed contracts, caching, streaming SSE, Feast, logging, Prometheus metrics, OTel traces.

Best for: serverless (Lambda, Cloud Run, Fly.io), CPU-only workloads where Ray process overhead isn't justified.

```python
# server.py
from sheaf.modal_server import _build_asgi_app
from sheaf.spec import ModelSpec
from sheaf.api.base import ModelType

app = _build_asgi_app([
    ModelSpec(name="chronos", model_type=ModelType.TIME_SERIES, backend="chronos2",
              backend_kwargs={"model_id": "amazon/chronos-bolt-small"}),
])
# uvicorn server:app --host 0.0.0.0 --port 8000
```

**Tier 2 — Single-node Ray Serve in a k8s pod**
`ray start --head` + `serve run` in the container. Full Sheaf feature set including `@serve.batch`, batching policy, and hot-swap. k8s HPA handles pod scaling. No KubeRay needed.

Best for: most teams already on k8s. Right default for production Kubernetes deployments.

**Tier 3 — KubeRay**
Ray manages the cluster itself across pods. Only necessary for multi-node distributed Ray execution. Non-trivial platform ask — most teams don't need it.

## SageMaker Inference (packaging only, not a feature)

The SageMaker container contract is:
- `POST /invocations` → `/predict`
- `GET /ping` → `/health`
- Port 8080

That's a `deploy/sagemaker/Dockerfile` + a `serve` entrypoint script. No Sheaf code changes. Pure packaging — the ASGI app already does everything SageMaker needs.

## SageMaker Batch Transform (v0.6, real integration)

Reads from S3, writes results to S3. Maps directly onto the planned v0.6 `BatchRunner`. This is where the actual integration work lives — not in the inference container.

## Proposed additions

- [ ] `Dockerfile` — multi-stage, installs `sheaf-serve[...]` extras, runs `uvicorn` on 8000
- [ ] `docker-compose.yml` — local dev / smoke test
- [ ] `deploy/kubernetes/` — `Deployment`, `Service`, `HPA` manifests
- [ ] `deploy/sagemaker/` — `Dockerfile` (port 8080, `/invocations` + `/ping` aliases), `serve` entrypoint script
- [ ] `examples/quickstart_kubernetes.py` — tier 2 path explicitly documented
- [ ] README clarifying all three tiers + SageMaker

## Longer term (v0.6+)

- SageMaker Batch Transform integration aligned with `BatchRunner`
- `KubernetesServer` class generating manifests from `ModelSpec` lists
- KEDA autoscaling on queue depth for `SheafWorker`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes-native deployment path #11

Problem

Deployment tiers

SageMaker Inference (packaging only, not a feature)

SageMaker Batch Transform (v0.6, real integration)

Proposed additions

Longer term (v0.6+)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Kubernetes-native deployment path #11

Description

Problem

Deployment tiers

SageMaker Inference (packaging only, not a feature)

SageMaker Batch Transform (v0.6, real integration)

Proposed additions

Longer term (v0.6+)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions