Ship 8 Tranche 7c: high-bit 4:4:4 RGBA u16 SIMD + sinker integration by uqio · Pull Request #31 · Findit-AI/colconv

uqio · 2026-04-27T05:33:44Z

Summary

Closes Ship 8 high-bit 4:4:4 RGBA. Wires u16 RGBA SIMD across all 5 backends (NEON, SSE4.1, AVX2, AVX-512, wasm simd128), wires the 8 u16 RGBA dispatchers in src/row/mod.rs, and lands sinker-level integration: with_rgba (u8) + with_rgba_u16 (u16) builders for 10 sinker formats with Strategy A combine paths. Mirrors PR #26 (Tranche 5b) exactly, which did the same for 4:2:0.

After this lands, every YUV format in the inventory has packed RGBA output via MixedSinker<F>::with_rgba / with_rgba_u16 — closing the sink-side RGBA gap that motivated Ship 8.

Changes

SIMD u16 RGBA (5 backends × 4 kernel families = 20 kernel refactors)

Each backend's existing u16 RGB kernel becomes a thin wrapper over a const-ALPHA template, alongside a new RGBA u16 wrapper:

Family	Const-ALPHA template	RGB wrapper	RGBA wrapper
Yuv444p_n u16 (BITS-generic)	`yuv_444p_n_to_rgb_or_rgba_u16_row<BITS, ALPHA>`	`yuv_444p_n_to_rgb_u16_row<BITS>`	`yuv_444p_n_to_rgba_u16_row<BITS>`
Yuv444p16 u16 (16-bit dedicated)	`yuv_444p16_to_rgb_or_rgba_u16_row<ALPHA>`	`yuv_444p16_to_rgb_u16_row`	`yuv_444p16_to_rgba_u16_row`
P_n_444 u16 (BITS-generic)	`p_n_444_to_rgb_or_rgba_u16_row<BITS, ALPHA>`	`p_n_444_to_rgb_u16_row<BITS>`	`p_n_444_to_rgba_u16_row<BITS>`
P_n_444_16 u16 (P416)	`p_n_444_16_to_rgb_or_rgba_u16_row<ALPHA>`	`p_n_444_16_to_rgb_u16_row`	`p_n_444_16_to_rgba_u16_row`

Only the per-iteration store and scalar tail dispatch branch on ALPHA; per-pixel math is unchanged. Alpha contracts:

BITS-generic kernels: alpha = (1 << BITS) - 1 (low-bit-packed at native depth)
16-bit dedicated kernels: alpha = 0xFFFF

Per-arch alpha splat: vdupq_n_u16(out_max as u16) (NEON) / _mm_set1_epi16(out_max) (x86, with -1i16 for 16-bit) / u16x8_splat(out_max as u16) (wasm). RGBA u16 store helpers (vst4q_u16, write_rgba_u16_8, write_rgba_u16_32, write_quarter_rgba) reused verbatim from PR #26's 4:2:0 work — no new helpers needed.

Dispatcher wiring (8 u16 RGBA dispatchers in `src/row/mod.rs`)

Replace the 8 let _ = use_simd; // SIMD per-arch routes land in Ship 8 Tranche 7c. stubs (landed in PR #29) with the standard cfg_select! per-arch route block:

yuv444p9_to_rgba_u16_row, yuv444p10_to_rgba_u16_row, yuv444p12_to_rgba_u16_row, yuv444p14_to_rgba_u16_row (BITS-generic planar)
yuv444p16_to_rgba_u16_row (16-bit dedicated planar)
p410_to_rgba_u16_row, p412_to_rgba_u16_row (BITS-generic Pn)
p416_to_rgba_u16_row (16-bit dedicated Pn)

use_simd = false still forces scalar. Section header doc updated to reflect u16 RGBA is now SIMD-wired.

Sinker integration (10 formats × 4 builders + Strategy A combine)

src/sinker/mixed/subsampled_4_4_4_high_bit.rs (8 formats): Yuv444p9/10/12/14/16, P410/P412/P416 each gain with_rgba / set_rgba / with_rgba_u16 / set_rgba_u16. Each format's process() is restructured to consume the new buffers via Strategy A:

u16 path: rgba_u16-only routes through *_to_rgba_u16_row; rgb_u16 + rgba_u16 runs the RGB kernel once and fans out via expand_rgb_u16_to_rgba_u16_row::<BITS>.
u8 path: same shape — rgba-only goes direct; rgb + rgba (or hsv + rgba) uses scratch + expand_rgb_to_rgba_row fan-out.

src/sinker/mixed/subsampled_4_2_2_high_bit.rs (2 formats): Yuv440p10 and Yuv440p12 were the explicit deferral from PR #28's "out of scope" note — they reuse the 4:4:4 dispatchers (yuv444p10/12_to_rgba(_u16)_row), which only became available with this PR. Now wired.

40 new builder methods total (4 × 10 formats); 10 process() restructures. All Strategy A helpers (expand_rgb_to_rgba_row, expand_rgb_u16_to_rgba_u16_row::<BITS>, rgba_plane_row_slice, rgba_u16_plane_row_slice) reused verbatim from PRs #20/#26.

Tests

Per-backend u16 RGBA equivalence (~30 tests): 6 per backend × 5 backends, mirroring PR #26's structure. Each backend covers all 4 kernel families across narrow + tail + 1920 widths, full ColorMatrix × range cross-product. All 18 new x86 #[test] functions include is_x86_feature_detected! early-return guards (per the PR #25 CI fallout — without them, ASAN sanitizer hits SIGILL and Miri reports UB on runners lacking the feature). NEON tests use #[cfg_attr(miri, ignore = \"...\")]. Wasm is module-level cfg-gated.

Sinker tests (9): Representative coverage of Yuv444p10 (BITS-generic planar — both u8 + u16 + Strategy A combine + buffer-too-short err), P410 (BITS-generic Pn semi-planar), Yuv444p16 (16-bit dedicated kernel), and Yuv440p10 (proves the 4:4:0 → 4:4:4 kernel reuse path works end-to-end). Matches PR #26's coverage scope.

Doc-fail example update

The compile_fail doctest in src/sinker/mixed/planar_8bit.rs previously demonstrated the type-system rejection by attempting with_rgba on Yuv444p10. After this PR, every YUV format in the inventory writes RGBA, so the example now points at Bayer (RAW source, no inherent alpha plane — genuinely lacks with_rgba).

Test plan

cargo test --lib: 534 pass on aarch64-darwin (host); was 519 → +6 NEON-side u16 RGBA + 9 sinker tests
cargo check --tests --lib clean across host, x86_64-unknown-freebsd, wasm32-unknown-unknown
RUSTFLAGS=\"-Dwarnings\" cargo clippy --lib --tests clean on host
cargo test --doc passes (the new Bayer compile_fail example correctly fails to compile)
Zero dead_code warnings — every new *_to_rgba_u16_row wrapper is consumed by its dispatcher; every dispatcher is consumed by a sinker or remains available for direct row callers

Codex adversarial review

Verdict: not run — Codex hit its OpenAI usage rate limit (9:08 PM retry window). The structural pattern is identical to PR #26 (4:2:0) which Codex approved. Re-run available on request once the rate limit clears.

Closes Ship 8 high-bit 4:4:4 (Tranche 7)

After this PR, Ship 8's Tranche 7 row in CHANGELOG.md flips to ✅ shipped:

7 (PR feat(row): Ship 8 — high-bit 4:4:4 RGBA scalar (SIMD lands in 6b/6c) #29): scalar prep + 16 dispatchers
7b (PR Ship 8 Tranche 7b: high-bit 4:4:4 RGBA u8 SIMD #30): u8 RGBA SIMD across 5 backends
7c (this PR): u16 RGBA SIMD + sinker integration (incl. Yuv440p10/12)

The remaining Ship 8 work item is Ship 8b (source-side YUVA — separate follow-up; out of scope for Ship 8).

🤖 Generated with Claude Code

Copilot

Pull request overview

Completes Ship 8 Tranche 7c by wiring high-bit 4:4:4 native-depth u16 RGBA SIMD across all supported backends, exposing the u16 RGBA row dispatchers, and integrating RGBA/RGBA-u16 output buffers into the relevant MixedSinker high-bit formats (including the previously-deferred Yuv440p10/12 reuse path).

Changes:

Added u16 RGBA SIMD wrappers/templates across NEON, SSE4.1, AVX2, AVX-512, and wasm simd128 backends (plus per-backend scalar equivalence tests).
Wired the 8 high-bit 4:4:4 u16 RGBA public row dispatchers in src/row/mod.rs to the per-arch SIMD backends (with scalar fallback).
Added sinker-level RGBA/RGBA-u16 integration tests and updated the compile-fail doctest negative example to use raw::Bayer.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
src/row/mod.rs	Wires high-bit 4:4:4 u16 RGBA dispatchers to SIMD backends with scalar fallback.
src/row/arch/neon.rs	Adds const-ALPHA u16 RGBA wrappers and shared impls for NEON 4:4:4 kernels.
src/row/arch/neon/tests.rs	Adds NEON u16 RGBA equivalence tests vs scalar reference.
src/row/arch/x86_sse41.rs	Refactors u16 RGB kernels into shared RGB/RGBA templates and adds u16 RGBA wrappers.
src/row/arch/x86_sse41/tests.rs	Adds SSE4.1 u16 RGBA equivalence tests with runtime feature detection.
src/row/arch/x86_avx2.rs	Refactors AVX2 u16 RGB kernels into shared RGB/RGBA templates and adds u16 RGBA wrappers.
src/row/arch/x86_avx2/tests.rs	Adds AVX2 u16 RGBA equivalence tests with runtime feature detection.
src/row/arch/x86_avx512.rs	Refactors AVX-512 u16 RGB kernels into shared RGB/RGBA templates and adds u16 RGBA wrappers.
src/row/arch/x86_avx512/tests.rs	Adds AVX-512 u16 RGBA equivalence tests with runtime feature detection.
src/row/arch/wasm_simd128.rs	Refactors wasm simd128 u16 RGB kernels into shared RGB/RGBA templates and adds u16 RGBA wrappers.
src/row/arch/wasm_simd128/tests.rs	Adds wasm simd128 u16 RGBA equivalence tests (cfg-gated on `target_feature="simd128"`).
src/sinker/mixed/subsampled_4_2_2_high_bit.rs	Adds `with_rgba`/`with_rgba_u16` (and setters) for `Yuv440p10/12` and integrates Strategy A fan-out where applicable.
src/sinker/mixed/planar_8bit.rs	Updates compile-fail doctest negative example to use `MixedSinker<Bayer>` since YUV formats now support RGBA.
src/sinker/mixed/tests.rs	Adds representative sinker integration tests for high-bit 4:4:4 RGBA/u16 RGBA and the Yuv440p10 reuse path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

al8n changed the title ~~update~~ Ship 8 Tranche 7c: high-bit 4:4:4 RGBA u16 SIMD + sinker integration Apr 27, 2026

al8n requested a review from Copilot April 27, 2026 05:37

Copilot started reviewing on behalf of al8n April 27, 2026 05:38 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

update

0acb53c

al8n force-pushed the feat/ship8-rgba-high-bit-444-u16-simd branch from 94002a7 to 0acb53c Compare April 27, 2026 06:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ship 8 Tranche 7c: high-bit 4:4:4 RGBA u16 SIMD + sinker integration#31

Ship 8 Tranche 7c: high-bit 4:4:4 RGBA u16 SIMD + sinker integration#31
uqio wants to merge 1 commit intomainfrom
feat/ship8-rgba-high-bit-444-u16-simd

uqio commented Apr 27, 2026 •

edited by al8n

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

uqio commented Apr 27, 2026 • edited by al8n Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

SIMD u16 RGBA (5 backends × 4 kernel families = 20 kernel refactors)

Dispatcher wiring (8 u16 RGBA dispatchers in src/row/mod.rs)

Sinker integration (10 formats × 4 builders + Strategy A combine)

Tests

Doc-fail example update

Test plan

Codex adversarial review

Closes Ship 8 high-bit 4:4:4 (Tranche 7)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

uqio commented Apr 27, 2026 •

edited by al8n

Loading

Dispatcher wiring (8 u16 RGBA dispatchers in `src/row/mod.rs`)