Ship 8 Tranche 7c: high-bit 4:4:4 RGBA u16 SIMD + sinker integration#31
Open
Ship 8 Tranche 7c: high-bit 4:4:4 RGBA u16 SIMD + sinker integration#31
Conversation
There was a problem hiding this comment.
Pull request overview
Completes Ship 8 Tranche 7c by wiring high-bit 4:4:4 native-depth u16 RGBA SIMD across all supported backends, exposing the u16 RGBA row dispatchers, and integrating RGBA/RGBA-u16 output buffers into the relevant MixedSinker high-bit formats (including the previously-deferred Yuv440p10/12 reuse path).
Changes:
- Added u16 RGBA SIMD wrappers/templates across NEON, SSE4.1, AVX2, AVX-512, and wasm simd128 backends (plus per-backend scalar equivalence tests).
- Wired the 8 high-bit 4:4:4 u16 RGBA public row dispatchers in
src/row/mod.rsto the per-arch SIMD backends (with scalar fallback). - Added sinker-level RGBA/RGBA-u16 integration tests and updated the compile-fail doctest negative example to use
raw::Bayer.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/row/mod.rs | Wires high-bit 4:4:4 u16 RGBA dispatchers to SIMD backends with scalar fallback. |
| src/row/arch/neon.rs | Adds const-ALPHA u16 RGBA wrappers and shared impls for NEON 4:4:4 kernels. |
| src/row/arch/neon/tests.rs | Adds NEON u16 RGBA equivalence tests vs scalar reference. |
| src/row/arch/x86_sse41.rs | Refactors u16 RGB kernels into shared RGB/RGBA templates and adds u16 RGBA wrappers. |
| src/row/arch/x86_sse41/tests.rs | Adds SSE4.1 u16 RGBA equivalence tests with runtime feature detection. |
| src/row/arch/x86_avx2.rs | Refactors AVX2 u16 RGB kernels into shared RGB/RGBA templates and adds u16 RGBA wrappers. |
| src/row/arch/x86_avx2/tests.rs | Adds AVX2 u16 RGBA equivalence tests with runtime feature detection. |
| src/row/arch/x86_avx512.rs | Refactors AVX-512 u16 RGB kernels into shared RGB/RGBA templates and adds u16 RGBA wrappers. |
| src/row/arch/x86_avx512/tests.rs | Adds AVX-512 u16 RGBA equivalence tests with runtime feature detection. |
| src/row/arch/wasm_simd128.rs | Refactors wasm simd128 u16 RGB kernels into shared RGB/RGBA templates and adds u16 RGBA wrappers. |
| src/row/arch/wasm_simd128/tests.rs | Adds wasm simd128 u16 RGBA equivalence tests (cfg-gated on target_feature="simd128"). |
| src/sinker/mixed/subsampled_4_2_2_high_bit.rs | Adds with_rgba/with_rgba_u16 (and setters) for Yuv440p10/12 and integrates Strategy A fan-out where applicable. |
| src/sinker/mixed/planar_8bit.rs | Updates compile-fail doctest negative example to use MixedSinker<Bayer> since YUV formats now support RGBA. |
| src/sinker/mixed/tests.rs | Adds representative sinker integration tests for high-bit 4:4:4 RGBA/u16 RGBA and the Yuv440p10 reuse path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
94002a7 to
0acb53c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes Ship 8 high-bit 4:4:4 RGBA. Wires u16 RGBA SIMD across all 5 backends (NEON, SSE4.1, AVX2, AVX-512, wasm simd128), wires the 8 u16 RGBA dispatchers in
src/row/mod.rs, and lands sinker-level integration:with_rgba(u8) +with_rgba_u16(u16) builders for 10 sinker formats with Strategy A combine paths. Mirrors PR #26 (Tranche 5b) exactly, which did the same for 4:2:0.After this lands, every YUV format in the inventory has packed RGBA output via
MixedSinker<F>::with_rgba/with_rgba_u16— closing the sink-side RGBA gap that motivated Ship 8.Changes
SIMD u16 RGBA (5 backends × 4 kernel families = 20 kernel refactors)
Each backend's existing u16 RGB kernel becomes a thin wrapper over a const-ALPHA template, alongside a new RGBA u16 wrapper:
yuv_444p_n_to_rgb_or_rgba_u16_row<BITS, ALPHA>yuv_444p_n_to_rgb_u16_row<BITS>yuv_444p_n_to_rgba_u16_row<BITS>yuv_444p16_to_rgb_or_rgba_u16_row<ALPHA>yuv_444p16_to_rgb_u16_rowyuv_444p16_to_rgba_u16_rowp_n_444_to_rgb_or_rgba_u16_row<BITS, ALPHA>p_n_444_to_rgb_u16_row<BITS>p_n_444_to_rgba_u16_row<BITS>p_n_444_16_to_rgb_or_rgba_u16_row<ALPHA>p_n_444_16_to_rgb_u16_rowp_n_444_16_to_rgba_u16_rowOnly the per-iteration store and scalar tail dispatch branch on
ALPHA; per-pixel math is unchanged. Alpha contracts:(1 << BITS) - 1(low-bit-packed at native depth)0xFFFFPer-arch alpha splat:
vdupq_n_u16(out_max as u16)(NEON) /_mm_set1_epi16(out_max)(x86, with-1i16for 16-bit) /u16x8_splat(out_max as u16)(wasm). RGBA u16 store helpers (vst4q_u16,write_rgba_u16_8,write_rgba_u16_32,write_quarter_rgba) reused verbatim from PR #26's 4:2:0 work — no new helpers needed.Dispatcher wiring (8 u16 RGBA dispatchers in
src/row/mod.rs)Replace the 8
let _ = use_simd; // SIMD per-arch routes land in Ship 8 Tranche 7c.stubs (landed in PR #29) with the standardcfg_select!per-arch route block:yuv444p9_to_rgba_u16_row,yuv444p10_to_rgba_u16_row,yuv444p12_to_rgba_u16_row,yuv444p14_to_rgba_u16_row(BITS-generic planar)yuv444p16_to_rgba_u16_row(16-bit dedicated planar)p410_to_rgba_u16_row,p412_to_rgba_u16_row(BITS-generic Pn)p416_to_rgba_u16_row(16-bit dedicated Pn)use_simd = falsestill forces scalar. Section header doc updated to reflect u16 RGBA is now SIMD-wired.Sinker integration (10 formats × 4 builders + Strategy A combine)
src/sinker/mixed/subsampled_4_4_4_high_bit.rs(8 formats):Yuv444p9/10/12/14/16,P410/P412/P416each gainwith_rgba/set_rgba/with_rgba_u16/set_rgba_u16. Each format'sprocess()is restructured to consume the new buffers via Strategy A:rgba_u16-only routes through*_to_rgba_u16_row;rgb_u16 + rgba_u16runs the RGB kernel once and fans out viaexpand_rgb_u16_to_rgba_u16_row::<BITS>.rgba-only goes direct;rgb + rgba(orhsv + rgba) uses scratch +expand_rgb_to_rgba_rowfan-out.src/sinker/mixed/subsampled_4_2_2_high_bit.rs(2 formats):Yuv440p10andYuv440p12were the explicit deferral from PR #28's "out of scope" note — they reuse the 4:4:4 dispatchers (yuv444p10/12_to_rgba(_u16)_row), which only became available with this PR. Now wired.40 new builder methods total (4 × 10 formats); 10
process()restructures. All Strategy A helpers (expand_rgb_to_rgba_row,expand_rgb_u16_to_rgba_u16_row::<BITS>,rgba_plane_row_slice,rgba_u16_plane_row_slice) reused verbatim from PRs #20/#26.Tests
Per-backend u16 RGBA equivalence (~30 tests): 6 per backend × 5 backends, mirroring PR #26's structure. Each backend covers all 4 kernel families across narrow + tail + 1920 widths, full ColorMatrix × range cross-product. All 18 new x86
#[test]functions includeis_x86_feature_detected!early-return guards (per the PR #25 CI fallout — without them, ASAN sanitizer hitsSIGILLand Miri reports UB on runners lacking the feature). NEON tests use#[cfg_attr(miri, ignore = \"...\")]. Wasm is module-level cfg-gated.Sinker tests (9): Representative coverage of
Yuv444p10(BITS-generic planar — both u8 + u16 + Strategy A combine + buffer-too-short err),P410(BITS-generic Pn semi-planar),Yuv444p16(16-bit dedicated kernel), andYuv440p10(proves the 4:4:0 → 4:4:4 kernel reuse path works end-to-end). Matches PR #26's coverage scope.Doc-fail example update
The
compile_faildoctest insrc/sinker/mixed/planar_8bit.rspreviously demonstrated the type-system rejection by attemptingwith_rgbaonYuv444p10. After this PR, every YUV format in the inventory writes RGBA, so the example now points atBayer(RAW source, no inherent alpha plane — genuinely lackswith_rgba).Test plan
cargo test --lib: 534 pass on aarch64-darwin (host); was 519 → +6 NEON-side u16 RGBA + 9 sinker testscargo check --tests --libclean across host, x86_64-unknown-freebsd, wasm32-unknown-unknownRUSTFLAGS=\"-Dwarnings\" cargo clippy --lib --testsclean on hostcargo test --docpasses (the newBayercompile_failexample correctly fails to compile)dead_codewarnings — every new*_to_rgba_u16_rowwrapper is consumed by its dispatcher; every dispatcher is consumed by a sinker or remains available for direct row callersCodex adversarial review
Verdict: not run — Codex hit its OpenAI usage rate limit (
9:08 PMretry window). The structural pattern is identical to PR #26 (4:2:0) which Codex approved. Re-run available on request once the rate limit clears.Closes Ship 8 high-bit 4:4:4 (Tranche 7)
After this PR, Ship 8's Tranche 7 row in
CHANGELOG.mdflips to ✅ shipped:The remaining Ship 8 work item is Ship 8b (source-side YUVA — separate follow-up; out of scope for Ship 8).
🤖 Generated with Claude Code