Skip to content

Add wire-level smoke test for Chronos2Backend.load() against real chronos #16

@korbonits

Description

@korbonits

Why

Issue #15 was a one-line wire-format bug: `Chronos2Backend.load()` passed `dtype=` to `BaseChronosPipeline.from_pretrained` where the signature accepts `torch_dtype=`. The existing `Chronos2Backend` unit tests mock the pipeline and only assert on mock call args, so they kept passing while the real integration was broken.

The bug only surfaced when running `examples/quickstart_batch.py` end-to-end against `amazon/chronos-bolt-tiny` — i.e. when a human happened to try the quickstart. Class of bug we should catch in CI, not at the demo.

What

Add a smoke test gated on `SHEAF_SMOKE_TEST=1` (consistent with `tests/test_smoke_ray.py`, `tests/test_smoke_whisper.py`) that:

  • Constructs `Chronos2Backend(model_id="amazon/chronos-bolt-tiny", device_map="cpu", torch_dtype="float32")`
  • Calls `.load()` against the real chronos library — no mocks
  • Calls `.predict(TimeSeriesRequest(history=[1.0, 2.0, 3.0, 4.0], horizon=3, frequency="1h"))` and asserts the response is a valid `TimeSeriesResponse` with `len(mean) == 3`

Skip if `chronos` isn't importable so the test is a no-op without `[time-series]` installed.

`amazon/chronos-bolt-tiny` is ~80MB — small enough for a CI smoke run if we ever wire one up; for now it just runs locally when a contributor sets the env var.

Stretch

Same shape for the other backends whose `load()` makes a real `from_pretrained` call: ESM-3, Nucleotide Transformer, MolFormer, MACE, Prithvi, GraphCast, FLUX, SDXL, VideoMAE, ViTPose, RAFT, DINOv2, OpenCLIP, SAM2, Depth Anything, DETR, MusicGen, Bark, Kokoro, Whisper (already covered by `test_smoke_whisper.py`). All can be the same pattern — gated on `SHEAF_SMOKE_TEST=1` and the relevant import-skipif.

A follow-up question worth answering once these exist: should CI run the cheap ones (chronos-bolt-tiny, dinov2-small, whisper-tiny.en, MACE-MP-0-small) on a separate `smoke` job that's allowed to take 5-10 min? That would catch wire-level regressions in mainline-relevant backends without blocking the fast unit suite.

Acceptance criteria

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions