Add wire-level smoke test for Chronos2Backend.load() against real chronos

## Why

Issue #15 was a one-line wire-format bug: \`Chronos2Backend.load()\` passed \`dtype=\` to \`BaseChronosPipeline.from_pretrained\` where the signature accepts \`torch_dtype=\`.  The existing \`Chronos2Backend\` unit tests mock the pipeline and only assert on **mock call args**, so they kept passing while the real integration was broken.

The bug only surfaced when running \`examples/quickstart_batch.py\` end-to-end against \`amazon/chronos-bolt-tiny\` — i.e. when a human happened to try the quickstart.  Class of bug we should catch in CI, not at the demo.

## What

Add a smoke test gated on \`SHEAF_SMOKE_TEST=1\` (consistent with \`tests/test_smoke_ray.py\`, \`tests/test_smoke_whisper.py\`) that:

- Constructs \`Chronos2Backend(model_id=\"amazon/chronos-bolt-tiny\", device_map=\"cpu\", torch_dtype=\"float32\")\`
- Calls \`.load()\` against the **real** chronos library — no mocks
- Calls \`.predict(TimeSeriesRequest(history=[1.0, 2.0, 3.0, 4.0], horizon=3, frequency=\"1h\"))\` and asserts the response is a valid \`TimeSeriesResponse\` with \`len(mean) == 3\`

Skip if \`chronos\` isn't importable so the test is a no-op without \`[time-series]\` installed.

\`amazon/chronos-bolt-tiny\` is ~80MB — small enough for a CI smoke run if we ever wire one up; for now it just runs locally when a contributor sets the env var.

## Stretch

Same shape for the other backends whose \`load()\` makes a real \`from_pretrained\` call: ESM-3, Nucleotide Transformer, MolFormer, MACE, Prithvi, GraphCast, FLUX, SDXL, VideoMAE, ViTPose, RAFT, DINOv2, OpenCLIP, SAM2, Depth Anything, DETR, MusicGen, Bark, Kokoro, Whisper (already covered by \`test_smoke_whisper.py\`).  All can be the same pattern — gated on \`SHEAF_SMOKE_TEST=1\` and the relevant import-skipif.

A follow-up question worth answering once these exist: should CI run the cheap ones (chronos-bolt-tiny, dinov2-small, whisper-tiny.en, MACE-MP-0-small) on a separate \`smoke\` job that's allowed to take 5-10 min?  That would catch wire-level regressions in mainline-relevant backends without blocking the fast unit suite.

## Acceptance criteria

- [ ] \`tests/test_smoke_chronos.py\` exists, gated on \`SHEAF_SMOKE_TEST=1\` + chronos import availability
- [ ] Hitting \`SHEAF_SMOKE_TEST=1 uv run --extra time-series pytest tests/test_smoke_chronos.py\` runs the real \`load()\` + \`predict()\`
- [ ] Test would have failed on the pre-#15 \`dtype=\` form (verifiable by reverting that one line)

## Related

- #15 (the bug this aims to prevent recurring)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add wire-level smoke test for Chronos2Backend.load() against real chronos #16

Why

What

Stretch

Acceptance criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add wire-level smoke test for Chronos2Backend.load() against real chronos #16

Description

Why

What

Stretch

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions