Ship 8 Tranche 5b: high-bit 4:2:0 RGBA u16 SIMD + sinker integration by uqio · Pull Request #26 · Findit-AI/colconv

uqio · 2026-04-26T13:21:46Z

Summary

Adds u16 RGBA SIMD across all 5 backends for high-bit 4:2:0 YUV (yuv420p9/10/12/14/16, p010/p012/p016), wires them into the 8 high-bit u16 RGBA dispatchers in src/row/mod.rs, and lands sinker-level integration: with_rgba (u8) + with_rgba_u16 (u16) builders on all 8 high-bit 4:2:0 MixedSinker impls. Closes the Ship 8 high-bit 4:2:0 RGBA work begun in PR #24 (scalar prep) and PR #25 (5a — u8 RGBA SIMD).

Changes

SIMD u16 RGBA (5 backends × 4 kernel families = 20 kernel refactors)

Each backend's *_to_rgb_u16_row<BITS> becomes a thin wrapper over *_to_rgb_or_rgba_u16_row<BITS, ALPHA>, alongside a new *_to_rgba_u16_row<BITS> wrapper. Kernel families:
- planar BITS-generic: yuv_420p_n_to_rgb_or_rgba_u16_row<BITS={9,10,12,14}, ALPHA>
- semi-planar BITS-generic: p_n_to_rgb_or_rgba_u16_row<BITS={10,12}, ALPHA> (P016 has its own family)
- 16-bit planar: yuv_420p16_to_rgb_or_rgba_u16_row<ALPHA>
- 16-bit semi-planar: p16_to_rgb_or_rgba_u16_row<ALPHA>
Only the store (vst3q_u16 vs vst4q_u16, write_rgb_u16_8 vs new write_rgba_u16_8, etc.) and scalar tail dispatch branch on ALPHA; per-pixel math is unchanged.
Alpha contract: (1 << BITS) - 1 for BITS-generic kernels, 0xFFFF for 16-bit kernels — matches the scalar references.
Compile-time const { assert!(BITS == ...) } retained on every shared template; added to p_n_to_rgb_or_rgba_u16_row (the prior p_n_to_rgb_u16_row was missing the guard).
New per-backend RGBA u16 store helpers: write_rgba_u16_8 (NEON via vst4q_u16, x86 SSE2-superset via two-stage unpack, wasm via i16x8_shuffle), write_rgba_u16_32 + write_quarter_rgba (AVX-512).

Dispatcher wiring (8 u16 RGBA dispatchers in `src/row/mod.rs`)

Replace the prior `let _ = use_simd; // ... Tranche 5b` stubs in yuv420p9/10/12/14/16_to_rgba_u16_row and p010/p012/p016_to_rgba_u16_row with the standard cfg_select! per-arch route block, mirroring the 5a u8 RGBA dispatchers. `use_simd = false` still forces scalar.
Section header + per-dispatcher doc comments updated to remove `Tranche 5b` placeholder language.
`expand_rgb_u16_to_rgba_u16_row` re-exported from `src/row/scalar.rs` for sinker-side Strategy A consumers.

Sinker integration (`src/sinker/mixed/subsampled_4_2_0_high_bit.rs`)

All 8 high-bit 4:2:0 `MixedSinker` impls gain 4 new builder methods each (32 new methods total): `with_rgba` / `set_rgba` (u8) and `with_rgba_u16` / `set_rgba_u16` (u16).
Each format's `PixelSink::process` restructured to consume the new buffers via Strategy A combine:
- u16 path: rgba_u16-only routes directly through `*_to_rgba_u16_row`; rgb_u16+rgba_u16 runs the RGB kernel once and fans out via `expand_rgb_u16_to_rgba_u16_row::` (cheap per-pixel pad with depth-aware alpha).
- u8 path: same shape — rgba-only goes direct; rgb+rgba (or hsv+rgba) uses the existing scratch + `expand_rgb_to_rgba_row` fan-out from PR feat(sinker): Ship 8 — Nv24/Nv42 RGBA + Strategy A RGB→RGBA fan-out #20.
New helper `rgba_u16_plane_row_slice` in `src/sinker/mixed/mod.rs` mirrors the existing `rgba_plane_row_slice` (u8) — used in 16 call sites across the 8 formats.
The `compile_fail` doctest in `planar_8bit.rs` that demonstrates "attaching RGBA to a sink that doesn't write it is rejected" was using `Yuv420p10` as its negative example; now updated to `Yuv422p10` (4:2:2 high-bit, still genuinely lacks `with_rgba`).

Tests (~36 new)

28 row-level RGBA u16 equivalence tests (6 per backend × 5 backends, modulo NEON which is 6 too): each covers all 4 kernel families across narrow + tail + 1920 widths and the full matrix × range cross-product.
8 sinker integration tests: `Yuv420p10` covers the BITS-generic planar path (rgba u8/u16 gray-to-gray, Strategy A combine for both depths, buffer-too-short err for both); `P010` covers the BITS-generic Pn path; `Yuv420p16` covers the 16-bit dedicated kernel.
All new x86 `#[test]` functions include `is_x86_feature_detected!` early-return guards (per the 5a CI fallout — without them, ASAN sanitizer gets SIGILL and Miri reports UB).

Test plan

`cargo test --lib` on host (aarch64-darwin / NEON path): 499 pass, 0 fail
`cargo check --tests --lib` clean across host / x86_64-unknown-freebsd / wasm32-unknown-unknown
`RUSTFLAGS="-Dwarnings" cargo clippy --lib --tests` clean
`cargo test --doc` clean (the updated `compile_fail` example correctly rejects `Yuv422p10`)
CI: ASAN sanitizer on x86_64-linux (should pass — guards in place)
CI: Miri on x86_64-linux (should pass — guards in place)
On-device equivalence run for AVX2 / AVX-512 / SSE4.1 hardware (deferred to CI)

Follow-ups (out of scope)

4:2:2 (`Yuv422p9/10/12/14/16`, `P210/P212/P216`) and 4:4:4 (`Yuv444p9/10/12/14/16`, `P410/P412/P416`) high-bit sinkers still lack `with_rgba` / `with_rgba_u16` — symmetric gaps closable in a future tranche.
Cleanup PR to split inline `mod tests` blocks out of large source files (per the `project_colconv_cleanup_split_tests` memory note).

🤖 Generated with Claude Code

Copilot

Pull request overview

Extends the “Ship 8” RGBA pipeline to high-bit-depth 4:2:0 formats by adding native-depth u16 RGBA support end-to-end (MixedSinker wiring + row dispatchers + per-arch SIMD implementations), plus targeted correctness/equivalence tests.

Changes:

Add with_rgba/with_rgba_u16 (and setters) and Strategy-A RGB→RGBA fanout wiring for high-bit-depth 4:2:0 MixedSinker formats (Yuv420p9/10/12/14/16, P010/P012/P016).
Enable SIMD dispatch for native-depth u16 RGBA row conversions across NEON, SSE4.1, AVX2, AVX-512, and wasm simd128 backends; add a shared x86 u16 RGBA interleave writer.
Add new tests covering MixedSinker RGBA behavior (subset) and per-arch SIMD equivalence vs scalar for native-depth u16 RGBA kernels.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/sinker/mixed/tests.rs	Adds MixedSinker-level RGBA tests for selected high-bit 4:2:0 formats (incl. alpha correctness + Strategy A equivalence checks).
src/sinker/mixed/subsampled_4_2_0_high_bit.rs	Wires RGBA/RGBA-u16 buffers into high-bit 4:2:0 sinkers and implements Strategy-A fanout/fast-path logic.
src/sinker/mixed/planar_8bit.rs	Updates compile-fail doc example to reference a still-unwired format for RGBA.
src/sinker/mixed/mod.rs	Adds `rgba_u16_plane_row_slice` helper for slicing `u16` RGBA rows safely.
src/row/mod.rs	Exposes `expand_rgb_u16_to_rgba_u16_row` and updates high-bit 4:2:0 `u16` RGBA dispatchers to actually SIMD-dispatch.
src/row/arch/x86_common.rs	Adds `write_rgba_u16_8` helper to interleave/store packed RGBA-u16 for 8 pixels on x86.
src/row/arch/x86_sse41.rs	Implements SSE4.1 native-depth `u16` RGBA kernels via shared RGB/RGBA core.
src/row/arch/x86_sse41/tests.rs	Adds SSE4.1 equivalence tests for native-depth `u16` RGBA kernels.
src/row/arch/x86_avx2.rs	Implements AVX2 native-depth `u16` RGBA kernels via shared RGB/RGBA core.
src/row/arch/x86_avx2/tests.rs	Adds AVX2 equivalence tests for native-depth `u16` RGBA kernels.
src/row/arch/x86_avx512.rs	Implements AVX-512 native-depth `u16` RGBA kernels (incl. RGBA stores) via shared RGB/RGBA core.
src/row/arch/x86_avx512/tests.rs	Adds AVX-512 equivalence tests for native-depth `u16` RGBA kernels.
src/row/arch/wasm_simd128.rs	Implements wasm simd128 native-depth `u16` RGBA kernels via shared RGB/RGBA core + RGBA store helper.
src/row/arch/wasm_simd128/tests.rs	Adds wasm simd128 equivalence tests for native-depth `u16` RGBA kernels.
src/row/arch/neon.rs	Implements NEON native-depth `u16` RGBA kernels via shared RGB/RGBA core.
src/row/arch/neon/tests.rs	Adds NEON equivalence tests for native-depth `u16` RGBA kernels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-26T13:38:33Z

 /// Routes through the dedicated 16-bit u16-output scalar kernel
 /// (`scalar::yuv_420p16_to_rgba_u16_row`) — uses i64 chroma multiply
-/// for the wider `coeff × u_d` product at 16 → 16-bit scaling. SIMD
-/// per-arch routes land in the follow-up Ship 8 Tranche 5b PR.
+/// for the wider `coeff × u_d` product at 16 → 16-bit scaling.
+/// `use_simd = false` forces the scalar reference path.


The doc comment claims this dispatcher "routes through the dedicated ... scalar kernel", but the implementation now conditionally dispatches to per-arch SIMD backends when use_simd is true. Please update the docs to describe SIMD-first dispatch with scalar fallback (and keep the note that use_simd = false forces the scalar reference).

Copilot · 2026-04-26T13:38:33Z

@@ -3518,8 +3856,8 @@ pub fn p016_to_rgba_row(
 /// `0xFFFF`.
 ///
 /// Routes through the dedicated 16-bit u16-output P016 scalar kernel


The docs say this dispatcher routes through the scalar kernel, but the implementation now attempts SIMD backends when use_simd is true and falls back to scalar otherwise. Please update the comment to reflect SIMD-first dispatch + scalar fallback (while keeping the note that use_simd = false forces scalar).

Suggested change

/// Routes through the dedicated 16-bit u16-output P016 scalar kernel

/// Dispatches to the best available backend for the current target and

/// falls back to the dedicated 16-bit u16-output P016 scalar kernel

uqio added 2 commits April 27, 2026 01:00

update

0798cf2

update

ad52ef6

al8n requested a review from Copilot April 26, 2026 13:21

Copilot started reviewing on behalf of al8n April 26, 2026 13:22 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

Comment thread src/sinker/mixed/mod.rs Outdated

al8n changed the title ~~Feat/ship8 rgba high bit 420 u16 simd~~ Ship 8 Tranche 5b: high-bit 4:2:0 RGBA u16 SIMD + sinker integration Apr 26, 2026

Update src/sinker/mixed/mod.rs

2da756a

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

al8n requested a review from Copilot April 26, 2026 13:31

Copilot started reviewing on behalf of al8n April 26, 2026 13:33 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

uqio merged commit cc09cbb into main Apr 26, 2026
47 checks passed

uqio deleted the feat/ship8-rgba-high-bit-420-u16-simd branch April 26, 2026 14:01

This was referenced Apr 26, 2026

feat(sinker): Ship 8 — high-bit 4:2:2 RGBA sinker integration #28

Merged

feat(row): Ship 8 — high-bit 4:4:4 RGBA scalar (SIMD lands in 6b/6c) #29

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ship 8 Tranche 5b: high-bit 4:2:0 RGBA u16 SIMD + sinker integration#26

Ship 8 Tranche 5b: high-bit 4:2:0 RGBA u16 SIMD + sinker integration#26
uqio merged 3 commits intomainfrom
feat/ship8-rgba-high-bit-420-u16-simd

uqio commented Apr 26, 2026 •

edited by al8n

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	/// Routes through the dedicated 16-bit u16-output P016 scalar kernel
	/// Dispatches to the best available backend for the current target and
	/// falls back to the dedicated 16-bit u16-output P016 scalar kernel

Conversation

uqio commented Apr 26, 2026 • edited by al8n Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

SIMD u16 RGBA (5 backends × 4 kernel families = 20 kernel refactors)

Dispatcher wiring (8 u16 RGBA dispatchers in `src/row/mod.rs`)

Sinker integration (`src/sinker/mixed/subsampled_4_2_0_high_bit.rs`)

Tests (~36 new)

Test plan

Follow-ups (out of scope)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

uqio commented Apr 26, 2026 •

edited by al8n

Loading