v0.8 — Client SDK (sheaf-client) + OpenAPI spec

Tracking the v0.8 Client SDK track called out in the README roadmap and the public blog post. Separate from #14 (Adoption Track — Docker/Helm/docs) and from LoRA adapter multiplexing; these three are independent v0.8 paths.

## Why

Today, calling Sheaf from an application means hand-rolling a typed HTTP client per model type. The request/response Pydantic classes ship with `sheaf-serve`, but pulling the entire serving package into a client just to reuse those types is heavy and wrong.

The blog post promises:
> `pip install sheaf-client` — typed Python client generated from request/response schemas
> Async client (httpx-backed); retry + timeout; streams SSE natively
> Language-agnostic: publish OpenAPI spec so teams can generate clients in any language

## Scope (v1)

### 1. Package structure
- [ ] New PyPI package `sheaf-client`. Lives in this repo at `packages/sheaf-client/` (monorepo via uv workspaces) so types stay in sync with `sheaf-serve` releases — same git tag drives both publishes.
- [ ] Depends on `pydantic>=2.0` and `httpx>=0.27`. **Does not** depend on `sheaf-serve`, ray, torch, or any backend package — that's the whole point.
- [ ] Pydantic request/response classes are duplicated/re-exported from a shared `sheaf-api-types` sub-package (also in the workspace), so both `sheaf-serve` and `sheaf-client` import from the same schema source rather than the client copy-pasting types.

### 2. Sync + async client
- [ ] `from sheaf_client import Client, AsyncClient`
- [ ] `Client(base_url, timeout=30, retries=2)` with `.predict(deployment, req) -> Response`, `.health(deployment)`, `.ready(deployment)`, `.metrics(deployment) -> str`
- [ ] `AsyncClient` with the same surface, `httpx.AsyncClient`-backed
- [ ] Auto-validates the request against the typed contract before the wire call (same `TypeAdapter(AnyRequest)` pattern as `JobQueueClient`)
- [ ] Returns the typed response, not a raw dict
- [ ] Sensible retry on 5xx + connection errors; no retry on 4xx

### 3. Streaming (SSE)
- [ ] `client.stream(deployment, req) -> Iterator[dict]` (sync) and `async_client.astream(deployment, req) -> AsyncIterator[dict]` (async)
- [ ] Parses `data: {json}\n\n` lines, yields each event dict, terminates on `done: true`
- [ ] Cleanly closes the connection on early break

### 4. Async-job worker path
- [ ] `JobsClient` (or extend `Client`) for the async worker pattern: `enqueue(deployment, req, webhook_url=None) -> job_id`, `wait_for_result(job_id, timeout) -> JobResult`
- [ ] **Note**: this requires worker-side HTTP submit/result endpoints that don't exist yet — a `SheafWorker` only consumes from Redis directly today. Either extend `SheafWorker` to expose a small ASGI submit/result API, or document the SDK's job path as Redis-only for v1 (in which case the SDK depends on `redis` optionally). Decide before implementing — both are defensible.

### 5. OpenAPI spec
- [ ] FastAPI already generates `/openapi.json` per deployment via `@serve.ingress`. Publish a stable, pinned snapshot at `docs/openapi.json` and ensure it includes every `AnyRequest` variant.
- [ ] Add an `openapi` CLI command on `sheaf-serve`: `sheaf openapi --models chronos2,whisper > openapi.json` so users can regenerate it for their deployment.
- [ ] Reference the spec from the docs site (mkdocs, per #14) so non-Python clients can use it for code generation.

### 6. Tests
- [ ] Unit: mock `httpx` transport, assert the wire payload + retry logic
- [ ] Integration: spin up a real Ray Serve `ModelServer` against a smoke backend, assert the client round-trips
- [ ] SSE: integration test against `/{name}/stream` with the FLUX progress event format (smoke equivalent)
- [ ] OpenAPI: snapshot test that the generated spec matches a checked-in `tests/fixtures/openapi.json`, so contract drift is caught on PR

### 7. Stretch
- [ ] TypeScript client generated from the OpenAPI spec via `openapi-typescript-codegen`, published to npm as `@korbonits/sheaf-client`
- [ ] Examples: `examples/quickstart_client.py` showing predict + stream + enqueue against a running ModelServer

## Acceptance criteria
- [ ] `pip install sheaf-client` works on a clean Python 3.11+ env, pulls in only `pydantic` + `httpx`
- [ ] Quickstart in `examples/quickstart_client.py`: spin up `ModelServer` with Chronos2, then in another shell run `python examples/quickstart_client.py` and get a forecast back
- [ ] OpenAPI spec snapshot test in CI
- [ ] Both `sheaf-serve` and `sheaf-client` ship under the same git tag in the same release

## Out of scope
- Authentication / API keys (separate issue when first user demands it)
- Multi-tenant routing (no use case yet)
- gRPC client (HTTP/SSE is enough for the v1 audience)

## Cross-references
- #14 — v0.8 Adoption Track (docs site references this SDK; benchmarks compare it to BentoML's client; Helm chart values include client connection examples)
- #17 — Launch readiness checklist (HN gate explicitly requires this SDK shipped)
- LoRA adapter multiplexing (separate v0.8 track) — once shipped, the SDK gains an `adapter_id=` parameter on `predict()`; design the client to accept arbitrary kwargs that pass through to the request body so this evolves cleanly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.8 — Client SDK (sheaf-client) + OpenAPI spec #18

Why

Scope (v1)

1. Package structure

2. Sync + async client

3. Streaming (SSE)

4. Async-job worker path

5. OpenAPI spec

6. Tests

7. Stretch

Acceptance criteria

Out of scope

Cross-references

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

v0.8 — Client SDK (sheaf-client) + OpenAPI spec #18

Description

Why

Scope (v1)

1. Package structure

2. Sync + async client

3. Streaming (SSE)

4. Async-job worker path

5. OpenAPI spec

6. Tests

7. Stretch

Acceptance criteria

Out of scope

Cross-references

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions