Skip to content

Conversation

@onethumb
Copy link
Contributor

The Problem

So far, much of the effort on this library has been around maximizing throughput for "large" input values (>1MiB).

As a result, performance for smaller input values (1KiB, 256 bytes, etc), while good, hasn't been as exceptional as it could be, particularly for the most popular CRC variants (CRC-32/ISCSI, CRC-32/ISO-HDLC, CRC-64/NVME).

The Solution

Optimize the code paths to maximize throughput even for small input sizes.

This results in a 34% to 200% increase (1KiB to 64 bytes) on modern Intel systems, and a 8% to 200% increase (1KiB to 64 bytes) on modern AWS Graviton4 systems.

Changes

  • Pass CrcParams by reference
  • Use native crc intrinsics for very small inputs
  • Provide optimized helper functions for the three most popular CRC variants (CRC-32/ISCSI, CRC-32/ISO-HDLC, CRC-64/NVME) which avoid some overhead in the generic path

Planned version bump

  • Which: MINOR
  • Why: non-breaking new functionality and performance improvements

Notes

There's probably more optimization work like this to be done, but this moved the needle a lot, so I'm shipping it as-is.

Pass CrcParams by reference to avoid copying relatively large data
structs (200+ bytes).

Avoid using get_calculator_params() for checksum() calls, using
hard-coded consts instead.
Not entirely sure why #[inline(always)] is faulting here on x86_64, but
not aarch64, but removing it certainly helps and solves.
These are the most popular CRC algorithms, so add helpers for them.

These functions are measurably faster since they avoid a match statement
overhead.
Using the current optimized build.
Makes benchmarking variants slightly easier
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes CRC calculation performance for small input sizes (64 bytes to 1 KiB) by avoiding unnecessary struct copies and using native CRC intrinsics. The changes result in 8-200% throughput improvements on modern x86 and ARM systems.

Key changes include:

  • Passing CrcParams by reference instead of by value to avoid copying the 200+ byte struct
  • Adding fast paths for small buffers (<= 256 bytes on x86, < 128 bytes on aarch64) using native CRC instructions
  • Providing optimized public API functions for the three most popular CRC variants to bypass generic overhead

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/traits.rs Updated CrcCalculator trait to pass CrcParams by reference
src/structs.rs Updated Calculator implementation and CrcParams methods to use references
src/lib.rs Inlined all algorithm cases in checksum(), added optimized crc32_iscsi/crc32_iso_hdlc/crc64_nvme functions, updated references throughout
src/ffi.rs Added FFI exports for the three new optimized CRC functions
src/crc64/algorithm.rs Updated process_0_to_15 to accept keys by reference
src/crc32/fusion/x86/mod.rs Added crc32_iscsi_small_fast for buffers <= 256 bytes using native CRC32 instructions, added small buffer fast path
src/crc32/fusion/aarch64/mod.rs Added crc32_iscsi_small_fast and crc32_iso_hdlc_small_fast for buffers < 128 bytes, updated wrappers to pass data_len
src/crc32/algorithm.rs Updated process_0_to_15 to accept keys by reference
src/combine.rs Updated checksums function to accept CrcParams by reference
src/arch/software.rs Updated update function to accept CrcParams by reference
src/arch/mod.rs Updated all update functions to accept CrcParams by reference, fixed test calls
src/algorithm.rs Updated all algorithm functions to use CrcParams references, optimized key extraction
libcrc_fast.h Added C header declarations for the three new optimized CRC functions
benches/benchmark.rs Removed redundant array size annotations
README.md Updated performance benchmarks with new numbers and corrected target names from eor3 to sha3

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Suggested by GitHub Copilot on PR review.
@onethumb onethumb merged commit f4ad50d into awesomized:main Dec 22, 2025
107 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant