You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tracking the v0.8 Client SDK track called out in the README roadmap and the public blog post. Separate from #14 (Adoption Track — Docker/Helm/docs) and from LoRA adapter multiplexing; these three are independent v0.8 paths.
Why
Today, calling Sheaf from an application means hand-rolling a typed HTTP client per model type. The request/response Pydantic classes ship with sheaf-serve, but pulling the entire serving package into a client just to reuse those types is heavy and wrong.
The blog post promises:
pip install sheaf-client — typed Python client generated from request/response schemas
Async client (httpx-backed); retry + timeout; streams SSE natively
Language-agnostic: publish OpenAPI spec so teams can generate clients in any language
Scope (v1)
1. Package structure
New PyPI package sheaf-client. Lives in this repo at packages/sheaf-client/ (monorepo via uv workspaces) so types stay in sync with sheaf-serve releases — same git tag drives both publishes.
Depends on pydantic>=2.0 and httpx>=0.27. Does not depend on sheaf-serve, ray, torch, or any backend package — that's the whole point.
Pydantic request/response classes are duplicated/re-exported from a shared sheaf-api-types sub-package (also in the workspace), so both sheaf-serve and sheaf-client import from the same schema source rather than the client copy-pasting types.
Parses data: {json}\n\n lines, yields each event dict, terminates on done: true
Cleanly closes the connection on early break
4. Async-job worker path
JobsClient (or extend Client) for the async worker pattern: enqueue(deployment, req, webhook_url=None) -> job_id, wait_for_result(job_id, timeout) -> JobResult
Note: this requires worker-side HTTP submit/result endpoints that don't exist yet — a SheafWorker only consumes from Redis directly today. Either extend SheafWorker to expose a small ASGI submit/result API, or document the SDK's job path as Redis-only for v1 (in which case the SDK depends on redis optionally). Decide before implementing — both are defensible.
5. OpenAPI spec
FastAPI already generates /openapi.json per deployment via @serve.ingress. Publish a stable, pinned snapshot at docs/openapi.json and ensure it includes every AnyRequest variant.
Add an openapi CLI command on sheaf-serve: sheaf openapi --models chronos2,whisper > openapi.json so users can regenerate it for their deployment.
Reference the spec from the docs site (mkdocs, per v0.8 — Adoption Track #14) so non-Python clients can use it for code generation.
Integration: spin up a real Ray Serve ModelServer against a smoke backend, assert the client round-trips
SSE: integration test against /{name}/stream with the FLUX progress event format (smoke equivalent)
OpenAPI: snapshot test that the generated spec matches a checked-in tests/fixtures/openapi.json, so contract drift is caught on PR
7. Stretch
TypeScript client generated from the OpenAPI spec via openapi-typescript-codegen, published to npm as @korbonits/sheaf-client
Examples: examples/quickstart_client.py showing predict + stream + enqueue against a running ModelServer
Acceptance criteria
pip install sheaf-client works on a clean Python 3.11+ env, pulls in only pydantic + httpx
Quickstart in examples/quickstart_client.py: spin up ModelServer with Chronos2, then in another shell run python examples/quickstart_client.py and get a forecast back
OpenAPI spec snapshot test in CI
Both sheaf-serve and sheaf-client ship under the same git tag in the same release
Out of scope
Authentication / API keys (separate issue when first user demands it)
Multi-tenant routing (no use case yet)
gRPC client (HTTP/SSE is enough for the v1 audience)
Cross-references
v0.8 — Adoption Track #14 — v0.8 Adoption Track (docs site references this SDK; benchmarks compare it to BentoML's client; Helm chart values include client connection examples)
LoRA adapter multiplexing (separate v0.8 track) — once shipped, the SDK gains an adapter_id= parameter on predict(); design the client to accept arbitrary kwargs that pass through to the request body so this evolves cleanly
Tracking the v0.8 Client SDK track called out in the README roadmap and the public blog post. Separate from #14 (Adoption Track — Docker/Helm/docs) and from LoRA adapter multiplexing; these three are independent v0.8 paths.
Why
Today, calling Sheaf from an application means hand-rolling a typed HTTP client per model type. The request/response Pydantic classes ship with
sheaf-serve, but pulling the entire serving package into a client just to reuse those types is heavy and wrong.The blog post promises:
Scope (v1)
1. Package structure
sheaf-client. Lives in this repo atpackages/sheaf-client/(monorepo via uv workspaces) so types stay in sync withsheaf-servereleases — same git tag drives both publishes.pydantic>=2.0andhttpx>=0.27. Does not depend onsheaf-serve, ray, torch, or any backend package — that's the whole point.sheaf-api-typessub-package (also in the workspace), so bothsheaf-serveandsheaf-clientimport from the same schema source rather than the client copy-pasting types.2. Sync + async client
from sheaf_client import Client, AsyncClientClient(base_url, timeout=30, retries=2)with.predict(deployment, req) -> Response,.health(deployment),.ready(deployment),.metrics(deployment) -> strAsyncClientwith the same surface,httpx.AsyncClient-backedTypeAdapter(AnyRequest)pattern asJobQueueClient)3. Streaming (SSE)
client.stream(deployment, req) -> Iterator[dict](sync) andasync_client.astream(deployment, req) -> AsyncIterator[dict](async)data: {json}\n\nlines, yields each event dict, terminates ondone: true4. Async-job worker path
JobsClient(or extendClient) for the async worker pattern:enqueue(deployment, req, webhook_url=None) -> job_id,wait_for_result(job_id, timeout) -> JobResultSheafWorkeronly consumes from Redis directly today. Either extendSheafWorkerto expose a small ASGI submit/result API, or document the SDK's job path as Redis-only for v1 (in which case the SDK depends onredisoptionally). Decide before implementing — both are defensible.5. OpenAPI spec
/openapi.jsonper deployment via@serve.ingress. Publish a stable, pinned snapshot atdocs/openapi.jsonand ensure it includes everyAnyRequestvariant.openapiCLI command onsheaf-serve:sheaf openapi --models chronos2,whisper > openapi.jsonso users can regenerate it for their deployment.6. Tests
httpxtransport, assert the wire payload + retry logicModelServeragainst a smoke backend, assert the client round-trips/{name}/streamwith the FLUX progress event format (smoke equivalent)tests/fixtures/openapi.json, so contract drift is caught on PR7. Stretch
openapi-typescript-codegen, published to npm as@korbonits/sheaf-clientexamples/quickstart_client.pyshowing predict + stream + enqueue against a running ModelServerAcceptance criteria
pip install sheaf-clientworks on a clean Python 3.11+ env, pulls in onlypydantic+httpxexamples/quickstart_client.py: spin upModelServerwith Chronos2, then in another shell runpython examples/quickstart_client.pyand get a forecast backsheaf-serveandsheaf-clientship under the same git tag in the same releaseOut of scope
Cross-references
adapter_id=parameter onpredict(); design the client to accept arbitrary kwargs that pass through to the request body so this evolves cleanly