[PWCI] "[v13] net: optimize __rte_raw_cksum and add tests"#618
[PWCI] "[v13] net: optimize __rte_raw_cksum and add tests"#618
Conversation
__rte_raw_cksum uses a loop with memcpy on each iteration.
GCC 15+ is able to vectorize the loop but Clang 18.1 is not.
Replace memcpy with direct pointer access using a packed struct with
__rte_may_alias attribute (same pattern as rte_memcpy.h). This enables
both GCC and Clang to vectorize with SSE/AVX/AVX-512 while avoiding
GCC strict-aliasing bugs without pragma workarounds.
This patch adds comprehensive fuzz testing and updates the performance
test to measure the optimization impact.
Performance results from cksum_perf_autotest on Intel Xeon
(Cascade Lake, AVX-512) built with Clang 18.1 (TSC cycles/byte):
Block size Before After Improvement
100 0.40 0.24 ~40%
1500 0.50 0.06 ~8x
9000 0.49 0.06 ~8x
Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com>
Signed-off-by: 0-day Robot <robot@bytheb.org>
📝 WalkthroughWalkthroughThis PR introduces fuzz testing infrastructure for checksum validation and optimizes the core checksum implementation. It adds a new fuzz test module (240 lines) with comprehensive edge-case and random data coverage, expands performance test data sizes, refactors the 16-bit processing loop in the checksum library using vectorized access, and updates the build configuration to include test dependencies. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
app/test/test_cksum_fuzz.c (1)
182-202: Solid random testing approach.The random length generation correctly covers the full range [0, 65536]. The 1000-iteration default balances coverage with test execution time.
For reproducibility when debugging failures, you might consider logging the random seed or providing a mechanism to replay specific seeds. However, this is optional since
rte_rand()may have its own seeding infrastructure.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
app/test/meson.buildapp/test/test_cksum_fuzz.capp/test/test_cksum_perf.clib/net/rte_cksum.h
🧰 Additional context used
🧬 Code graph analysis (1)
app/test/test_cksum_fuzz.c (1)
lib/net/rte_cksum.h (1)
__rte_raw_cksum(42-68)
🔇 Additional comments (8)
app/test/meson.build (1)
40-42: LGTM!The new fuzz test dependency entry correctly follows the existing pattern and maintains alphabetical ordering with the other checksum test files.
app/test/test_cksum_perf.c (1)
18-18: LGTM!The expanded test sizes appropriately cover jumbo frame boundaries (9000/9001) and 64K GRO limits (65536/65537), aligning with the fuzz test coverage. The paired even/odd values ensure both aligned and odd-length paths are exercised.
lib/net/rte_cksum.h (2)
59-65: Odd-byte handling correctly preserved.The byte-order independent approach using
memcpyfor the trailing byte maintains correctness across different endianness, consistent with the reference implementation.
45-57: Well-designed optimization using the packed alias pattern.The approach correctly addresses:
- Strict aliasing:
__rte_may_aliasprevents UB from type-punning throughuint16_t*- Unaligned access:
__rte_packedensures safe reads regardless of buffer alignment- Vectorization: Simple loop structure allows GCC/Clang to auto-vectorize with SSE/AVX
The local struct definition within function scope is unconventional but valid, and keeps the alias type private to this implementation.
app/test/test_cksum_fuzz.c (4)
26-52: Good choice to retain the reference implementation for comparison.The reference implementation correctly mirrors the original DPDK v23.11
__rte_raw_cksumusingmemcpyfor safe unaligned access, providing a reliable baseline for validating the optimization.
72-114: Comprehensive test function with good diagnostics.The test properly handles edge cases (zero length, allocation constraints) and provides helpful hexdump output on failures for debugging. Memory management is correct on both success and failure paths.
138-177: Excellent edge case coverage.The test array strategically targets boundaries that commonly expose bugs: powers of 2 (vectorization boundaries), MTU sizes (1500/1501), and 64K GRO limits. Testing each length with both zero and random initial sums strengthens the validation.
204-240: Well-organized test harness.The test progression from edge cases to random testing is logical—edge cases run quickly and catch common issues first. The progress output clearly indicates which phase is executing or has failed.
NOTE: This is an auto submission for "[v13] net: optimize __rte_raw_cksum and add tests".
See "http://patchwork.dpdk.org/project/dpdk/list/?series=37013" for details.
Summary by CodeRabbit
Release Notes
Tests
Performance
✏️ Tip: You can customize this high-level summary in your review settings.