ci(vhs): shard tape suite per-feature (#187)#188
Merged
Conversation
Splits the monolithic 20m `Run tapes` job into 6 parallel shards (cli, tui-launch, tui-papers, tui-reader, theme, connectivity) each with a 6m budget. A flake on one shard no longer wipes the others (`fail-fast: false`); total wall-clock drops from ~20m to <8m even with one shard exhausting its retry budget. New `Tape shard coverage` job asserts every `.tape` on disk is claimed by exactly one shard, so a freshly added tape can't silently never run. New `Tapes` rollup job aggregates the matrix result so branch protection only needs to require a single check name instead of all 6 shard jobs. Refs #187. [tape-exempt: CI-only workflow change; no TUI/CLI source touched]
Review fixes on the prior commit: - The `Tapes` rollup now also depends on `shard-coverage`. Without this, a freshly added tape that's not in any shard would pass the rollup (existing 19 shards stay green) and only fail `Tape shard coverage` — a single-check branch protection rule would miss it. - `tui-reader` shard timeout bumped from 6m to 10m. Its 4 tapes (incl. tui-export-keybind, the historically flakiest) can each trigger up to 3 retries in the distinctness gate; 6m × worst case was tight. Other 5 shards keep their 6m budget. - Renamed `missing` / `extra` → `unsharded` / `stale` in `shard-coverage` so the variable name matches the user-facing error message direction. Refs #187. [tape-exempt: CI-only workflow change; no TUI/CLI source touched]
Empirical observation from the first sharded run on PR #188: Run tapes (cli) 5m39s pass (4 tapes) Run tapes (connectivity) 4m15s pass (2 tapes) Run tapes (theme) 5m21s pass (3 tapes) Run tapes (tui-launch) 4m49s pass (2 tapes) Run tapes (tui-papers) 7m01s FAIL (4 tapes; 6m budget exhausted) The cargo build + ttyd/ffmpeg install consumes ~3m of fixed overhead before a single tape renders, leaving 3m for 4 TUI tapes that take 60-90s each. 6m was too tight as a default. Bumping uniformly to 10m matches what tui-reader already had and gives every shard headroom without losing the wall-clock parallelism win (shards still run concurrently; total wall-clock stays <10m). Refs #187. [tape-exempt: CI-only workflow change; no TUI/CLI source touched]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #187. The 19-tape monolithic
Run tapesjob (chronically timing out at 20m on dev + PRs) is split into 6 parallel shards, each with its own budget:Two new gates round it out:
Tape shard coverageasserts every `.tape` on disk is claimed by exactly one shard, so a newly added tape can't silently never run.Tapesrollup aggregates shard + coverage results into a single check name. Branch protection only needs to require this one.fail-fast: falseso a flake in one shard no longer wipes the others. Total wall-clock drops from ~20m to <8m even with one shard exhausting its retry budget.Test plan
python3 yaml.safe_loadparses cleanlyactionlintclean (only pre-existing SC2001 stylistic warnings on the untouchedcoveragestep)[tape-exempt: CI-only workflow change; no TUI/CLI source touched]