Skip to content

[pull] master from libretro:master#982

Merged
pull[bot] merged 21 commits intoAlexandre1er:masterfrom
libretro:master
May 1, 2026
Merged

[pull] master from libretro:master#982
pull[bot] merged 21 commits intoAlexandre1er:masterfrom
libretro:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented May 1, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

LibretroAdmin and others added 21 commits May 1, 2026 00:06
Small follow-up cleanup, no behavioural change.

  - Combine the empty-msg / missing-font-data / missing-font-tex
    guards into a single early return at the top.  The previous
    arrangement ran SetTextureColorMod and the position math even
    when the loop body would never execute.

  - Drop the off_x / off_y / tex_x / tex_y temporaries - they were
    one-shot copies of gly-> fields used immediately afterwards.

  - SDL_RenderCopyEx with rotation=0, no center, SDL_FLIP_NONE is
    just SDL_RenderCopy.  The legacy bitmap OSD font is never
    rotated or flipped (any rotation in the new backend is applied
    via per-vertex math in the gfx_display draw path) so the simpler
    call is correct.

  - Comment up top noting the function's role - it is a legitimate
    fallback path (HAVE_GFX_WIDGETS undefined, widgets disabled in
    settings, or pre-widget-init OSD frames) so future readers
    don't mistake it for dead code now that set_osd_msg dispatches
    real text through the sdl2_raster_font render_msg.
The Windows winraw input driver does not work alongside the SDL2
video driver because SDL2 internally registers for raw input and
drains the WM_INPUT stream via GetRawInputBuffer() to feed its own
keyboard / mouse APIs.  winraw's hidden HWND_MESSAGE window then
never sees any input events, so keyboard / mouse polling silently
stops working.

This is a fundamental architectural mismatch and not something the
SDL2 video driver can fix from RA's side - SDL owns its window's
WndProc.  Unlike the d3d8 / d3d9 / d3d11 / d3d12 drivers, which
create their main window themselves with a winraw-aware WndProc
(wnd_proc_d3d_winraw, gfx/common/win32_common.c) that forwards the
relevant focus / input messages, SDL_CreateWindow installs SDL's
own WndProc and we have no clean way to subclass or replace it.

The bug has existed since both drivers existed, but went largely
unnoticed before the recent SDL2 backend work because the SDL2
driver was missing menu rendering and rarely used as a primary
choice.  Now that it is feature-complete, more users will hit the
combination.

Detect the broken combo at sdl2_gfx_init time and emit two
RARCH_WARN lines with concrete actionable guidance ("switch to
dinput in Settings -> Drivers -> Input") so the user has a clear
explanation instead of a black-hole input experience.  Does not
refuse to init - gamepad-only users still get a working setup.

Gated on _WIN32 (and not _XBOX / __WINRT__) since winraw is
Windows-only.  See input/drivers/winraw_input.c for the input side.
metal_common.h was a 420-line header included from exactly three
translation units: metal.m itself, ui/drivers/ui_cocoa.m, and
ui/drivers/ui_cocoatouch.m.  Of those, only metal.m used more than a
single declaration -- both UI files only needed the empty
@interface MetalView : MTKView @EnD forward declaration to type the
result of [MetalView new] and a (MetalView *) cast on _renderView.
The other ~415 lines (Context, FrameView, MetalDriver, MenuDisplay,
HDR machinery, RPixelFormat enum, the slang_process.h transitive
include, etc.) were dead weight on the two UI translation units.

Move the header's content into metal.m where it was already used as
a single TU via griffin_objc.m's umbrella, and replace the includes
in the two UI files with the three-line MetalView forward decl plus
<MetalKit/MetalKit.h>.  No public Metal driver C/Obj-C symbols are
referenced outside metal.m, so nothing else needs to change.

Optimisations enabled by the fold:

  - The four file-scope C functions exposed only for "external"
    consumers that did not exist (RPixelFormatToBPP, matrix_proj_ortho,
    glslang_format_to_metal, SelectOptimalPixelFormat) become static.
    glslang_format_to_metal and SelectOptimalPixelFormat get matching
    static forward decls near the top of the file because their
    definitions sit below their first call sites.

  - Three properties that were declared in the public header surface
    for downstream consumers that never materialised are dead in the
    closed system and have been removed:

      Context.drawableFormat       (getter returned _layer.pixelFormat)
      Context.hdrOffscreenFormat   (getter returned _hdrOffscreenFormat)
      FrameView.shaderEmitsHDR10
      FrameView.shaderEmitsHDR16

    The backing ivars _hdrOffscreenFormat, _shaderEmitsHDR10, and
    _shaderEmitsHDR16 are kept -- they are still read inside the
    owning class.  Only the public Obj-C accessors and their @Property
    declarations are gone.

  - RETRO_BEGIN_DECLS / RETRO_END_DECLS dropped along with the
    retro_common_api.h include they required (no public C API left).

  - Foundation/Metal/MetalKit/QuartzCore umbrella imports collapsed to
    the set already present at the top of metal.m.

The two pkg/apple/RetroArch_Metal.xcodeproj entries for metal_common.h
are removed.  The header had two PBXFileReference IDs in the project
(one declared, one dangling under the metal/ group); both group
children entries and the lone declaration are dropped.

Diff: +391 / -446 metal.m, -420 metal_common.h, -8/-9 ui_cocoa{,touch}.m,
      -3 project.pbxproj.  Net -61 lines.

griffin/griffin_objc.m only #imports metal.m and is unaffected.  The
Makefile.common metal.o rule and the per-file -fobjc-arc override in
the top-level Makefile (metal.m only) are unchanged; MRC-compiled
ui_cocoa{,touch}.m gain a tiny @interface MetalView : MTKView @EnD
that has no methods or properties and so compiles identically under
either memory model.
menu_state_get_ptr() is a trivial getter that returns &menu_driver_state
(file-scope static).  It cannot be inlined across TUs without LTO, so
every call is a real function call returning a constant pointer.  Four
menu drivers had accumulated redundant calls that can be folded into
the existing function-top menu_st caches without any behavioural change.

Two patterns addressed:

1. Inner-block shadows -- a function caches menu_st at the top, then a
   nested block re-declares its own menu_st = menu_state_get_ptr().
   Since the outer cache is in scope and menu_st is never reassigned
   anywhere in these files, the inner shadow always binds the same
   pointer.  Drop the inner declaration and let the block use the
   outer one.

2. Direct chained calls -- menu_state_get_ptr()->field used in a
   function that already has menu_st cached.  Replace with menu_st->field.

   Plus one site (rgui_populate_entries) where three direct calls
   coexist with no cache; add a function-top cache and replace the
   three calls with menu_st-> accesses.

Per-function changes:

  materialui.c:
    materialui_frame                     -1 (inner shadow)
    materialui_populate_entries          -6 (1 inner shadow + 5 direct)
    materialui_parse_menu_entry_action   -6 (6 inner shadows in switch arms)
    materialui_pointer_up                -1 (inner shadow)

  xmb.c:
    xmb_parse_menu_entry_action          -1 (inner shadow)
    xmb_render                           -2 (2 inner shadows in drag arms)

  rgui.c:
    rgui_render                          -1 (inner shadow)
    rgui_populate_entries                -2 net (3 direct -> cache, +1 cache)
    rgui_pointer_up                      -1 (inner shadow)

  ozone.c:
    ozone_parse_menu_entry_action        -1 (inner shadow)
    ozone_frame                          -1 (inner shadow)

Total: 23 fewer calls (122 -> 99 across the four files), -16 source
lines net.  No semantic change -- menu_state_get_ptr() returns the
same pointer every time, and no caller reassigns the cached menu_st
anywhere in the touched files (verified by grep).
Follow-up to 40ad5bc, applying the same intra-function deduplication
pattern to three more files where a function-top menu_st cache
already exists but inner blocks needlessly re-fetched it (or used
menu_state_get_ptr()-> chained access).

retroarch.c command_event:
   - Drop 3 inner shadows of menu_st (CMD_EVENT_ADD_TO_PLAYLIST,
     CMD_EVENT_RESET_CORE_ASSOCIATION, CMD_EVENT_DISK_EJECT_TOGGLE);
     each is gated by #ifdef HAVE_MENU and the function-top cache
     at the start of command_event is gated by the same condition,
     so the inner shadows always bound the same pointer.
   - Replace 5 direct menu_state_get_ptr()-> chained accesses with
     menu_st-> (3 in CMD_EVENT_CLOSE_CONTENT, 1 each in
     CMD_EVENT_AUDIO_STOP and CMD_EVENT_AUDIO_START).  All inside
     HAVE_MENU paths, so the outer cache is reachable.
   - Also drop a redundant `settings` shadow inside
     CMD_EVENT_ADD_TO_PLAYLIST -- the outer command_event-scope
     `settings = config_get_ptr()` is unconditional and reaches
     into the HAVE_MENU block.

menu/cbs/menu_cbs_ok.c generic_action_ok_displaylist_push:
   - Drop 1 inner shadow in ACTION_OK_DL_DATABASE_MANAGER_LIST.

input/input_driver.c input_keyboard_event:
   - Drop 1 inner shadow in the screensaver-disable block.

Total: 10 fewer calls, no semantic change (menu_state_get_ptr()
returns the same file-scope-static pointer every time, and none
of the touched files reassign menu_st anywhere).

Other multi-call sites surveyed but intentionally skipped:
general_write_handler, menu_action_handle_setting, and
runloop_environment_cb each have several sibling case-block caches
in giant switches.  Promoting any of these to a function-top cache
would penalize every dispatch (most cases don't touch menu_st), so
the per-case pattern is correct there.
rgui_set_pixel_format_function selects the 16bpp menu framebuffer
conversion function and the transparency-support boolean based on
the active video driver ident. The selection has grown over time:
ps2, gx, psp1 individually, then rsx, then a four-driver d3d10/11/12/
metal block, then sdl_dingux, sdl_rs90, xvideo, and most recently
d3d8/d3d9_hlsl/d3d9_cg in commits 71aede9 and 345edcb. As of master
the chain covers fourteen idents in a chain of nested if/else if /
string_is_equal blocks with embedded comments tying conv-function
selections to platform groups.

Replace the chain with a const lookup table.  Each row is one
driver: ident, conversion function, transparency flag.  The
selection function reduces to a linear scan returning on first
match; the previously-special-cased empty / unknown / NULL ident
paths collapse into a single fallback after the loop.

This:
  - makes adding a new driver a one-line table entry instead of
    finding the right grouped block to extend;
  - removes the multiple early-return paths that diverged on
    the transparency flag (some inside the if-chain, some after
    it);
  - keeps the chain's per-row comments for platform context.

Behavioral equivalence verified against a standalone test harness
covering all 14 currently-recognized idents, NULL, empty string,
mixed-case ('PS2'), whitespace variants (' ps2', 'ps2 '), substring
prefixes that must NOT match ('p', 'ps', 'd3d', 'd3d1'), and six
unknown-but-real driver idents (vulkan, gl2, gl3, wayland, sdl2,
gdi).  29 cases compared, zero diffs.  Sabotage diagnostic: swapping
ps2's transparency flag in the table reproduces the expected one-line
DIFF, confirming the test path actually exercises the new code.

Compiles clean at -std=c89 -pedantic -Wall -Wextra against an inlined
copy of the new code.  Uses no constructs not already present
elsewhere in this file (ARRAY_SIZE is already used at line 4589;
bool / string_is_equal / static const struct arrays are all
established patterns in the codebase).

No new code paths.  No driver behavior change.  Refactor only.
The four `video_*` fields on gl1_t hold the *core's emulated frame
size* — the dimensions of the pixel buffer the libretro core hands
to gl1_frame, e.g. 256x224 for SNES.  They are not the window size,
not the backbuffer size, and not the menu size.  gl1_t already has
separate `screen_width`/`screen_height` for the actual window
dimensions and `menu_width`/`menu_height`/`menu_pitch`/`menu_bits`
for the menu surface.

The naming is misleading because gl2_t and gl3_t each have their own
`video_width`/`video_height` fields that hold the *backbuffer* size
— the same name for a different quantity.  The semantic mismatch
caused a bug during the audit fix for the screenshot readback
helper (commit d197d89): the gl1 site of that fix could not use
`gl1->video_width` like the gl2 / gl3 sites did, because in gl1
that field is the core frame's width, not the readable region of
the framebuffer.  d197d89 worked around the issue by calling
`video_driver_get_size()` instead.  Two existing comments in this
file (lines 850-855 and 985-993 prior to this commit) were defensive
notes warning future contributors not to misread these fields.

Rename the four `video_*` fields to `frame_*`:

  video_width  -> frame_width
  video_height -> frame_height
  video_pitch  -> frame_pitch
  video_bits   -> frame_bits

This:

  - matches the actual semantic (these are the libretro core
    *frame* properties);
  - parallels the existing `menu_width`/`menu_height`/`menu_pitch`/
    `menu_bits` fields, which name a sibling concept;
  - causes future cross-driver copy-paste from gl2 or gl3 to fail
    at compile time (`gl1->video_width` no longer exists) instead
    of silently producing a different result.

The two defensive comments are simplified — the field name now
says what it is, so the explanation no longer needs to warn the
reader off the wrong reading.

21 reference sites updated across the file, plus the four field
declarations and two comment rewrites.  No behavior change.  No
external callers: `gl1_t` is file-private (declared at the top of
gl1.c with no header counterpart), so the rename is contained to
this single file.

Verified that no stale `gl1->video_(width|height|pitch|bits)` or
`gl->video_(width|height|pitch|bits)` references remain by grep,
and that the renamed-field types still compile clean at -std=c89
-pedantic against a stub harness.
Same rename as gfx/gl1 in d654940, applied to the FPGA and GDI
drivers for consistency.  The four `video_*` fields on each driver
struct hold the libretro core's emulated frame size — assigned from
`video->width` / `video->height` at init and updated from
`frame_width` / `frame_height` per-call — not the window or
backbuffer size.  Both drivers already have separate fields for the
distinct concepts they track:

  fpga_t:  menu_width / menu_height / menu_pitch / menu_bits
           (the menu surface, sibling to the core frame group)

  gdi_t:   screen_width / screen_height (actual surface size)
           full_width / full_height     (window size last published
                                         via video_driver_set_size)
           bmp_width / bmp_height       (DDB's own size)
           menu_width / menu_height / menu_pitch / menu_bits

For gdi, the semantic mismatch was already partially documented in
two defensive comments in gdi_defines.h (lines 114-116 and 119-124
before this commit) and one in gdi_gfx.c (around line 2853).
Renaming the fields lets those comments shrink: the field name now
says what it is, so the explanations no longer need to warn the
reader off the wrong reading.

Rename in both drivers:

  video_width  -> frame_width
  video_height -> frame_height
  video_pitch  -> frame_pitch
  video_bits   -> frame_bits

fpga_t is file-private (declared at the top of fpga_gfx.c with no
header counterpart): the rename is contained to that one file, 11
reference sites plus the four declarations.

gdi_t lives in the shared header gfx/common/gdi_defines.h, used by
both gfx/drivers/gdi_gfx.c and gfx/common/win32_common.c (the
shared WM_PAINT body for the three GDI window procs).  20 reference
sites updated in gdi_gfx.c, 3 in win32_common.c, plus the four
declarations and three comment rewrites.  No callers outside these
three files (verified by grep across the source tree).

Same risk profile as gl1: pure rename, no behavior change, no new
code paths.  Causes future cross-driver copy-paste from gl2 / gl3
to fail at compile time (`fpga->video_width` / `gdi->video_width`
no longer exist) instead of silently producing a different result.

Verified that no stale `fpga->video_(...)` or `gdi->video_(...)`
field references remain, and that the renamed fields compile clean
at -std=c89 -pedantic against stub harnesses.
frame_width/frame_height and backbuffer width/height
video_driver_get_size and video_driver_set_size publish the size of
the area where the active video driver presents pixels to the user.
The old name says "size" without saying "size of what," and the
ambiguity has cost: the caca driver bug fixed in ad986a6 was hidden
behind exactly this confusion -- caca was passing the libretro
core's frame size to set_size when the API contract was the canvas
(surface) size, and the call site looked plausible until the field
was renamed to expose what it actually held.

Rename the public API to make the contract explicit at the call
site, and add the documentation comment that should have been on
these declarations from the start:

  video_driver_get_size  -> video_driver_get_output_size
  video_driver_set_size  -> video_driver_set_output_size

"output_size" was chosen over the alternatives:

  - window_size is wrong on terminal drivers (caca, sixel) and
    DRM-fullscreen-no-window
  - surface_size collides with SDL_Surface and D3D surface concepts
  - display_size implies the physical monitor, not the output area
  - present_size is jargon and reads oddly as a noun
  - viewport_size collides with the existing vp.{width,height}
    aspect-corrected sub-rect

"output_size" describes purpose (size of the visible output area)
without implying mechanism, doesn't collide with existing terms,
and maps cleanly onto every driver: desktop window output, DRM
scanout output, libcaca canvas output, sixel terminal output.

The doc comment in gfx/video_driver.h spells out the per-driver
mapping (window client area for desktop GL/D3D/Vulkan/Metal,
framebuffer scanout for DRM/KMS, canvas grid for libcaca,
terminal output region for sixel), the consumer list (menu
drivers, CRT switcher, shader subsystem, input drivers, other
video drivers), threading (protected by video_st->display_lock),
and explicitly contrasts with the libretro core's frame size that
lives on each driver's local state (frame_width / frame_height
after the gl1 / fpga / gdi / network / sixel / caca renames; gl2
/ gl3 still keep their canonical video_width / video_height which
mean the backbuffer size).

74 call sites across 33 files updated by word-boundary sed; no
collisions (no other identifiers contain video_driver_get_size or
_set_size as a substring).

Validation: the file with the function definitions
(gfx/video_driver.o) and a representative sample of caller files
(menu/drivers/{xmb,ozone,materialui}.o, gfx/drivers/gl{1,2,3}.o,
gfx/video_crt_switch.o, gfx/video_shader_parse.o) all build clean
in this sandbox.  The remaining 24 files hit unrelated
dependency errors (windows.h not on Linux, libcaca / libvg
headers not installed) before reaching anything rename-related;
none produced a rename-related diagnostic.  Maintainer review on
a full build environment recommended before merge.

No behavior change.  Pure rename + documentation.
…o set_output_size

Two related changes to the VGA framebuffer driver, applying the
same pattern as the gl1 / fpga / gdi / network / sixel / caca
sequence (d654940 / 61ce292 / e675181 / 4668a2f / ad986a6) plus
the bug fix that the caca work surfaced.

Rename
------

The four vga_video_* fields on vga_t hold the libretro core's
emulated frame size: assigned from video->width / video->height
at init (line 201-202), updated from frame_width / frame_height
per-call (line 248-258), and consumed by the per-pixel scaling
loops at line 303-304 / 327-328 which scale the core's frame
down to fit the actual VGA framebuffer (VGA_WIDTH x VGA_HEIGHT =
320x200).

Rename, dropping the redundant vga_ prefix while we're at it
(the struct is already called vga_t):

  vga_video_width  -> frame_width
  vga_video_height -> frame_height
  vga_video_pitch  -> frame_pitch
  vga_video_bits   -> frame_bits

The renamed fields parallel vga_t's existing vga_menu_* group:
both are "input frame from $source", just one is the menu
surface and the other is the running core.

Bug fix
-------

vga_gfx_alive called video_driver_set_output_size with the
renamed frame_width / frame_height -- the libretro core's frame
size, e.g. 256x224 for SNES.  Per the docstring added in cecd19f,
set_output_size publishes the area where the active video driver
presents pixels.  For this driver that area is the VGA hardware
framebuffer at VGA_WIDTH x VGA_HEIGHT (320x200, mode 13h).  Lines
303-304 and 327-328 already scale the core's frame down to fit
it; those constants are the actual output area.  Pass them to
set_output_size instead.

The bug was hidden behind the misleading old field name: when
the field was called vga_video_width and held the core's frame
size, the video_driver_set_output_size(vga->vga_video_width, ...)
call read like it made sense.  After the rename, the same call
reads as video_driver_set_output_size(vga->frame_width, ...) --
visibly wrong at the call site, exactly the class of confusion
the rename pattern was meant to surface.

The existing TODO/FIXME comment immediately above the call --
"check if this is valid" -- was correctly placed; the original
author knew something was off.  Comment dropped, it's resolved.

Same shape as the caca fix in ad986a6.

Risk: if any caller of video_driver_get_output_size on vga was
relying on seeing the core's frame size (the old buggy value),
they will now see VGA_WIDTH x VGA_HEIGHT instead.  No such
caller is plausible: the API contract is the output area, and
the menu drivers / input drivers were getting nonsense input
under the old behaviour (a 256x224 output area on a 320x200
framebuffer would draw widgets clipped or off-screen).  This is
a strict bug fix in the direction of the documented contract.

Incidentally fixes a stale comment in sdl2_gfx.c at line 258
that listed vga in "Most other drivers (vga, gx2, d3d8, d3d9
common) make this call too" -- before this commit, vga was
making the call but not correctly; after, the comment is true.

Same risk profile as the rest of the rename pattern: vga_t is
file-private (declared at the top of vga_gfx.c with no header
counterpart, verified by grep across the source tree).  No
external callers besides the comment fix in sdl2_gfx.c.
The SSE2 fast path of conv_rgb565_abgr8888 produced ARGB byte order
in memory ([b g r a] per pixel), not ABGR ([r g b a]).  The scalar
fallback (lines 387-396) and the NEON fast path (lines 369-385)
both produce the correct ABGR byte order.  On x86 hosts with SSE2
the function silently returned a different byte layout than its
name promised and than its scalar fallback delivered.

Root cause: the SSE2 kernel at lines 358-365 was a literal copy of
the conv_rgb565_argb8888 SSE2 kernel at lines 247-255 (the two
were byte-for-byte identical).  Both pair (b,g) into the low half
of each pixel and (r,a) into the high half shifted up 2 bytes,
producing [b g r a] byte order which is ARGB uint32 LE.  That's
correct for argb8888 but wrong for abgr8888.

Fix: swap the unpack pairings so (r,g) goes into the low half and
(b,a) into the high half shifted up 2 bytes, producing [r g b a]
byte order == ABGR uint32 LE.  Same instruction count (4 unpacks +
1 shift + 2 ors), same vector width (8 pixels per iteration), no
perf change.

Verified bit-exact against the scalar fallback for every possible
RGB565 input.  Brute-force test of all 65,536 16-bit values:
  before fix: 63,488 of 65,536 outputs differ from scalar (97%)
  after fix:  0 of 65,536 differ
For pure red input 0xF800:
  scalar:    0xFF0000FF (ABGR uint32, R in low byte)
  SSE before: 0xFFFF0000 (ARGB uint32, R in second byte)
  SSE after:  0xFF0000FF (matches scalar)

Caller audit: of the production sites that route to pixconv via
the SCALER_FMT_ABGR8888 dispatch in scaler.c:126, none are
currently observed to hit the RGB565 -> ABGR8888 pair on an x86
host running SSE2.  vulkan.c and gl3.c readback paths use ABGR8888
-> BGR24 (different pair).  switch_nx_gfx is ARM64 (NEON, not
affected).  pipewire camera input could in principle hit it if a
camera produces RGB565 output, but modern webcams typically
produce YUYV / MJPEG.  The bug has been hidden by absence of
production callers; this fix prevents future contributors from
inheriting it.
dxgi_copy's scalar loop for the B5G6R5_UNORM -> R8G8B8A8_UNORM
format pair is exactly conv_rgb565_abgr8888 in pixconv.c.  The
output uint32 (a << 24) | (b << 16) | (g << 8) | r matches DXGI
R8G8B8A8_UNORM byte order [r g b a] and matches pixconv's abgr8888
output verbatim.

Replace the open-coded scalar loop with a call into pixconv,
removing one of dxgi_copy's many hand-rolled per-pixel converters
in favor of the canonical SIMD-aware implementation.

Code-consolidation patch with measured speedup as a side benefit:

  Resolution    scalar (ms/frame)  pixconv SSE2 (ms/frame)  speedup
  256x224       0.03               0.02                     2.3x
  640x480       0.19               0.09                     2.1x
  1280x720      0.59               0.27                     2.2x
  1920x1080     1.37               0.69                     2.0x
  3840x2160     6.5                4.7                      1.4x

(x86_64 Linux, gcc -O3 -msse2, average of 3 runs.)  Absolute
savings sub-millisecond at typical emulator resolutions; modestly
helpful at 1080p/4K.  ARM/NEON path produces equivalent output.

Other dxgi_copy format-pair branches have less clean pixconv
matches (e.g. B5G6R5 -> B8G8R8X8 nearly matches conv_rgb565_argb8888
but pixconv writes 0xff in the X byte where the existing scalar
writes 0; preserving the existing behavior bit-for-bit would
require a new pixconv variant, so leaving those alone for now).
Companion to 7e44a5e (which covered B5G6R5 -> R8G8B8A8 with
conv_rgb565_abgr8888).  The B8G8R8A8_UNORM destination has byte
order [b g r a] in memory, which is exactly what pixconv's
conv_rgb565_argb8888 produces (uint32 LE
(a << 24) | (r << 16) | (g << 8) | b).  Bit-identical output to
the existing scalar loop's
(r << 16) | (g << 8) | b | (255 << 24).

Same speedup characteristics as the abgr8888 case: ~2x at sub-4K
resolutions, ~1.4x at 4K.  Sub-millisecond absolute savings at
typical emulator resolutions.

Other dxgi_copy pairs considered but skipped:
  - B5G6R5 -> B8G8R8X8: pixconv conv_rgb565_argb8888 puts 0xff
    in the X byte where the existing scalar writes 0; per-spec
    'X' means undefined, but preserving bit-for-bit existing
    behavior avoids a subtle change.
  - R8G8B8A8 <-> B8G8R8A8 byte-swap pairs: pixconv's
    conv_argb8888_abgr8888 is scalar only (3-op bit-twiddle).
    Benchmarked at parity with the dxgi loop after compiler
    auto-vectorization (~0.9x to 1.1x, within noise).  No perf
    win; code consolidation alone insufficient justification.
@pull pull Bot locked and limited conversation to collaborators May 1, 2026
@pull pull Bot added the ⤵️ pull label May 1, 2026
@pull pull Bot merged commit 8e9131b into Alexandre1er:master May 1, 2026
43 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant