feat(row): Ship 8 — high-bit 4:4:4 RGBA scalar (SIMD lands in 6b/6c)#29
feat(row): Ship 8 — high-bit 4:4:4 RGBA scalar (SIMD lands in 6b/6c)#29
Conversation
There was a problem hiding this comment.
Pull request overview
Adds the scalar-reference layer and public row-level API surface for high-bit-depth 4:4:4 → RGBA conversions (u8 and native-depth u16), mirroring the earlier high-bit 4:2:0 staged rollout so SIMD wiring can land later without signature churn.
Changes:
- Add scalar shared-template kernels for high-bit 4:4:4 RGBA across planar (
Yuv444p{9,10,12,14,16}) and semi-planar (P410/P412/P416) families, with constant-opaque alpha. - Add 16 new public row dispatchers in
row::for u8 RGBA and u16 RGBA outputs (currently scalar-routed;use_simdintentionally ignored for now). - Add scalar reference tests that pin the alpha contract for representative kernel families.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/row/scalar.rs | Introduces shared *_to_rgb_or_rgba_* scalar templates and RGBA wrappers for high-bit 4:4:4 planar + semi-planar kernels. |
| src/row/mod.rs | Adds 16 new public high-bit 4:4:4 RGBA dispatchers (u8 + u16) that validate slice lengths and route to scalar. |
| src/row/scalar/tests.rs | Adds 6 scalar tests validating gray mapping sanity and the required opaque-alpha values per family. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Compile-time guard — fails monomorphization for any BITS outside | ||
| // {10, 12, 14}. The 16-bit path lives in `yuv_444p16_to_rgb_row` | ||
| // {9, 10, 12, 14}. The 16-bit path lives in `yuv_444p16_to_rgb_row` | ||
| // (i32 u8-output kernel family). Without this guard a caller | ||
| // invoking ::<16> would reach the NEON clamp where | ||
| // invoking ::<16, _> would reach the NEON clamp where | ||
| // `(1 << BITS) - 1 as i16` silently wraps to -1. |
There was a problem hiding this comment.
The compile-time guard comment here references a “NEON clamp”, but this is the scalar kernel implementation. Consider rewording this comment to describe the generic 16-bit footgun (e.g., i16/i32 clamping/overflow behavior) without mentioning a specific SIMD backend, or move the backend-specific rationale to the SIMD dispatcher/docs.
Summary
First of three sub-PRs that add high-bit-depth 4:4:4 RGBA support — exactly mirroring the 3-PR structure that landed high-bit 4:2:0 RGBA in PRs #24 → #25 → #26.
This PR (analogous to #24): scalar reference RGBA kernels + 16 public dispatchers. The dispatchers are SIMD-shaped (accept
use_simd: bool) but always route to scalar — theuse_simdflag is held in the signature so the follow-up SIMD PRs (6b: u8 SIMD, 6c: u16 SIMD + sinker) wire per-arch routes without breaking call-site signatures.Why ship the scalar layer first
Three reasons match the 4:2:0 precedent:
The dispatchers' `use_simd` parameter is deliberately ignored in this PR — see the "Out of scope" + "Codex review verdict" sections below.
Changes
Scalar (
src/row/scalar.rs)8 new const-ALPHA template wrappers + 8 thin RGB shims over those templates. Mirrors the established 4:2:0 pattern (e.g.
yuv_420p_n_to_rgb_or_rgba_row<BITS, ALPHA>):yuv_444p_n_to_rgb_or_rgba_row<BITS, ALPHA>yuv_444p_n_to_rgb_row<BITS>yuv_444p_n_to_rgba_row<BITS>yuv_444p_n_to_rgb_or_rgba_u16_row<BITS, ALPHA>yuv_444p_n_to_rgb_u16_row<BITS>yuv_444p_n_to_rgba_u16_row<BITS>yuv_444p16_to_rgb_or_rgba_row<ALPHA>yuv_444p16_to_rgb_rowyuv_444p16_to_rgba_rowyuv_444p16_to_rgb_or_rgba_u16_row<ALPHA>yuv_444p16_to_rgb_u16_rowyuv_444p16_to_rgba_u16_rowp_n_444_to_rgb_or_rgba_row<BITS, ALPHA>p_n_444_to_rgb_row<BITS>p_n_444_to_rgba_row<BITS>p_n_444_to_rgb_or_rgba_u16_row<BITS, ALPHA>p_n_444_to_rgb_u16_row<BITS>p_n_444_to_rgba_u16_row<BITS>p_n_444_16_to_rgb_or_rgba_row<ALPHA>p_n_444_16_to_rgb_rowp_n_444_16_to_rgba_rowp_n_444_16_to_rgb_or_rgba_u16_row<ALPHA>p_n_444_16_to_rgb_u16_rowp_n_444_16_to_rgba_u16_rowAlpha contracts:
0xFF(always opaque, no alpha plane in source)(1 << BITS) - 10xFFFFEach shared template preserves the original RGB function's
const { assert!(BITS == ...) }BITS guard.Dispatchers (
src/row/mod.rs)16 new public dispatchers under a new
// ---- High-bit 4:4:4 RGBA dispatchers (Ship 8 Tranche 6 prep) ----section header:yuv444p9_to_rgba_row,yuv444p10_to_rgba_row,yuv444p12_to_rgba_row,yuv444p14_to_rgba_row,yuv444p16_to_rgba_row,p410_to_rgba_row,p412_to_rgba_row,p416_to_rgba_row._u16_rowsuffix.Each dispatcher validates slice lengths (mirroring the existing RGB dispatchers' bounds checks), then drops
use_simdand calls the scalar reference:This is identical to how PR #24 staged the 4:2:0 dispatchers before #25/#26 wired SIMD.
Tests (
src/row/scalar/tests.rs)+6 scalar reference tests, one per kernel family:
yuv_444p_n_to_rgba_row::<10>gray-to-gray (validates u8 alpha =0xFF)yuv_444p_n_to_rgba_u16_row::<10>gray-to-gray (validates alpha =1023)yuv_444p16_to_rgba_rowgray-to-gray (validates alpha =0xFF)yuv_444p16_to_rgba_u16_rowgray-to-gray (validates alpha =0xFFFF)p_n_444_to_rgba_row::<10>gray-to-grayp_n_444_16_to_rgba_u16_rowgray-to-gray (validates alpha =0xFFFF)These exercise the scalar code path and pin the alpha contract per family. SIMD equivalence tests (per backend × per kernel × per BITS) land alongside the SIMD wiring in 6b/6c.
Out of scope (deferred to follow-up sub-PRs)
use_simd: boolparameter is held in every dispatcher's signature now so 6b/6c can wirecfg_select!blocks without changing the public API.MixedSinker<Yuv444p9..16>,<P410/P412/P416>,<Yuv440p10/12>) — Tranche 6c, alongside u16 SIMD.Codex adversarial review verdict
Verdict: needs-attention, with one finding flagging that
use_simdis silently ignored in the new dispatchers. This is intentional and matches the established precedent set by PR #24 (titled "SIMD lands in 5a/5b"); see PR #24 description for the same rationale. The flag will become meaningful when 6b (u8 SIMD) and 6c (u16 SIMD) wire per-arch routes — as it did for the 4:2:0 family in PRs #25 and #26.Codex did not flag any correctness or design issues with the scalar refactor itself.
Test plan
cargo test --lib: 513 pass (was 507 before the new scalar tests; +6 in this PR)cargo check --tests --libclean across host (aarch64-darwin), x86_64-unknown-freebsd, wasm32-unknown-unknownRUSTFLAGS=\"-Dwarnings\" cargo clippy --lib --testsclean (zero dead-code warnings — every new template is consumed by its RGB+RGBA wrapper pair, and every new dispatcher consumes its scalar)cargo doc --lib --no-depsclean (56 doc warnings, all pre-existing baseline)Follow-ups
🤖 Generated with Claude Code