Skip to content

Add ARM64 NEON SIMD support#87

Open
slhck wants to merge 1 commit intocd-athena:stablefrom
slhck:arm-simd
Open

Add ARM64 NEON SIMD support#87
slhck wants to merge 1 commit intocd-athena:stablefrom
slhck:arm-simd

Conversation

@slhck
Copy link

@slhck slhck commented Dec 28, 2025

Implement NEON-optimized DCT transforms for ARM64 architecture:

  • DCT 8x8, 16x16, 32x32 for 8/10/12-bit video
  • Runtime CPU detection for ARM (NEON is mandatory on ARMv8)
  • Proper CMake integration for ARM architecture detection
  • Fallback to C reference when SIMD unavailable

New files:

  • arm/dct-neon.cpp: NEON DCT implementations
  • arm/dct-neon.h: Function declarations
  • arm/neon-utils.h: Transpose and helper functions
  • arm/entropy-neon.cpp: Placeholder for future optimization

Output is bit-exact with C reference implementation. Tested on Apple Silicon (M-series).

Disclaimer: The majority of this code was generated with Claude Opus 4.5 and manually reviewed by me. The output was verified to be the same as when compared with #86 — I could not get the stable branch to run on my Mac otherwise. I realize there may be concerns regarding the use of LLMs for open-source projects like this because they do shift the reviewing burden to the maintainers. Also, the case could be made that I do not fully understand what every line of code does. This is true – I did not dive into source/lib/analyzer/simd/arm/dct-neon.cpp much. But I think that in this case there are quite a few benefits from making this application work on Apple Silicon, and I would not have been able to write and test this myself as easily.

Also note: the code is partly inspired by x265's DCT implementation which is licensed under GPL v2.0 or later, so it's compatible with this repo.

Implement NEON-optimized DCT transforms for ARM64 architecture:

- DCT 8x8, 16x16, 32x32 for 8/10/12-bit video
- Runtime CPU detection for ARM (NEON is mandatory on ARMv8)
- Proper CMake integration for ARM architecture detection
- Fallback to C reference when SIMD unavailable

New files:

- arm/dct-neon.cpp: NEON DCT implementations
- arm/dct-neon.h: Function declarations
- arm/neon-utils.h: Transpose and helper functions
- arm/entropy-neon.cpp: Placeholder for future optimization

Output is bit-exact with C reference implementation.
Tested on Apple Silicon (M-series).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant