-
Notifications
You must be signed in to change notification settings - Fork 11
Optimize performance for smaller sizes #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pass CrcParams by reference to avoid copying relatively large data structs (200+ bytes). Avoid using get_calculator_params() for checksum() calls, using hard-coded consts instead.
Not entirely sure why #[inline(always)] is faulting here on x86_64, but not aarch64, but removing it certainly helps and solves.
These are the most popular CRC algorithms, so add helpers for them. These functions are measurably faster since they avoid a match statement overhead.
Using the current optimized build.
Makes benchmarking variants slightly easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes CRC calculation performance for small input sizes (64 bytes to 1 KiB) by avoiding unnecessary struct copies and using native CRC intrinsics. The changes result in 8-200% throughput improvements on modern x86 and ARM systems.
Key changes include:
- Passing
CrcParamsby reference instead of by value to avoid copying the 200+ byte struct - Adding fast paths for small buffers (<= 256 bytes on x86, < 128 bytes on aarch64) using native CRC instructions
- Providing optimized public API functions for the three most popular CRC variants to bypass generic overhead
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/traits.rs | Updated CrcCalculator trait to pass CrcParams by reference |
| src/structs.rs | Updated Calculator implementation and CrcParams methods to use references |
| src/lib.rs | Inlined all algorithm cases in checksum(), added optimized crc32_iscsi/crc32_iso_hdlc/crc64_nvme functions, updated references throughout |
| src/ffi.rs | Added FFI exports for the three new optimized CRC functions |
| src/crc64/algorithm.rs | Updated process_0_to_15 to accept keys by reference |
| src/crc32/fusion/x86/mod.rs | Added crc32_iscsi_small_fast for buffers <= 256 bytes using native CRC32 instructions, added small buffer fast path |
| src/crc32/fusion/aarch64/mod.rs | Added crc32_iscsi_small_fast and crc32_iso_hdlc_small_fast for buffers < 128 bytes, updated wrappers to pass data_len |
| src/crc32/algorithm.rs | Updated process_0_to_15 to accept keys by reference |
| src/combine.rs | Updated checksums function to accept CrcParams by reference |
| src/arch/software.rs | Updated update function to accept CrcParams by reference |
| src/arch/mod.rs | Updated all update functions to accept CrcParams by reference, fixed test calls |
| src/algorithm.rs | Updated all algorithm functions to use CrcParams references, optimized key extraction |
| libcrc_fast.h | Added C header declarations for the three new optimized CRC functions |
| benches/benchmark.rs | Removed redundant array size annotations |
| README.md | Updated performance benchmarks with new numbers and corrected target names from eor3 to sha3 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Suggested by GitHub Copilot on PR review.
The Problem
So far, much of the effort on this library has been around maximizing throughput for "large" input values (>1MiB).
As a result, performance for smaller input values (1KiB, 256 bytes, etc), while good, hasn't been as exceptional as it could be, particularly for the most popular CRC variants (CRC-32/ISCSI, CRC-32/ISO-HDLC, CRC-64/NVME).
The Solution
Optimize the code paths to maximize throughput even for small input sizes.
This results in a 34% to 200% increase (1KiB to 64 bytes) on modern Intel systems, and a 8% to 200% increase (1KiB to 64 bytes) on modern AWS Graviton4 systems.
Changes
CrcParamsby referencecrcintrinsics for very small inputsPlanned version bump
MINORNotes
There's probably more optimization work like this to be done, but this moved the needle a lot, so I'm shipping it as-is.