Skip to content

Commit dd3c796

Browse files
committed
Improve runtime feature detection
Uses more runtime feature detection, rather than compile time feature detection, for improved reliability, maintainability, graceful degradataion across CPU families, and performance. Should help minimize bugs such as #14 in the future.
1 parent a341b52 commit dd3c796

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+5990
-2255
lines changed

.github/workflows/tests.yml

Lines changed: 38 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,45 @@ on:
66
workflow_dispatch:
77

88
jobs:
9-
test-accelerated:
10-
name: Test accelerated (aarch64, x86_64)
9+
test-aarch64:
10+
name: Test aarch64
1111
strategy:
1212
matrix:
13-
os: [ubuntu-latest, ubuntu-22.04-arm, ubuntu-24.04-arm, macos-latest]
13+
os: [ubuntu-22.04-arm, ubuntu-24.04-arm, macos-14, macos-15, macos-26, macos-latest, windows-11-arm]
1414
rust-toolchain:
1515
- "1.81" # minimum for this crate
16-
- "1.89" # when VPCLMULQDQ was stabilized
16+
- "1.89" # when AVX-512 VPCLMULQDQ was stabilized
17+
- "stable"
18+
- "nightly"
19+
runs-on: ${{ matrix.os }}
20+
steps:
21+
- uses: actions/checkout@v4 # not pinning to commit hash since this is a GitHub action, which we trust
22+
- uses: actions-rust-lang/setup-rust-toolchain@9d7e65c320fdb52dcd45ffaa68deb6c02c8754d9 # v1.12.0
23+
with:
24+
toolchain: ${{ matrix.rust-toolchain }}
25+
components: rustfmt, clippy
26+
cache-key: ${{ matrix.os }}-${{ matrix.rust-toolchain }}
27+
- name: Check
28+
run: cargo check
29+
- name: Architecture check
30+
run: cargo run --bin arch-check
31+
- if: ${{ matrix.rust-toolchain != 'nightly' }}
32+
name: Format
33+
run: cargo fmt -- --check
34+
- if: ${{ matrix.rust-toolchain != 'nightly' }}
35+
name: Clippy
36+
run: cargo clippy
37+
- name: Test
38+
run: cargo test
39+
40+
test-x86_64:
41+
name: Test x86_64
42+
strategy:
43+
matrix:
44+
os: [ ubuntu-latest, ubuntu-22.04, ubuntu-24.04, macos-13, macos-15-intel, windows-2022, windows-2025, windows-latest ]
45+
rust-toolchain:
46+
- "1.81" # minimum for this crate
47+
- "1.89" # when AVX-512 VPCLMULQDQ was stabilized
1748
- "stable"
1849
- "nightly"
1950
runs-on: ${{ matrix.os }}
@@ -38,14 +69,14 @@ jobs:
3869
run: cargo test
3970

4071
test-x86:
41-
name: Test accelerated (x86)
72+
name: Test x86
4273
runs-on: ubuntu-latest
4374
strategy:
4475
matrix:
4576
target: [i586-unknown-linux-gnu, i686-unknown-linux-gnu]
4677
rust-toolchain:
4778
- "1.81" # minimum for this crate
48-
- "1.89" # when VPCLMULQDQ was stabilized
79+
- "1.89" # when AVX-512 VPCLMULQDQ was stabilized
4980
- "stable"
5081
- "nightly"
5182
steps:
@@ -71,7 +102,7 @@ jobs:
71102
target: [powerpc-unknown-linux-gnu, powerpc64-unknown-linux-gnu]
72103
rust-toolchain:
73104
- "1.81" # minimum for this crate
74-
- "1.89" # when VPCLMULQDQ was stabilized
105+
- "1.89" # when AVX-512 VPCLMULQDQ was stabilized
75106
- "stable"
76107
- "nightly"
77108
steps:

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@
33
/test/test_*.bin
44
.idea
55
.DS_Store
6-
.git
6+
.git
7+
.vscode
Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
# Design Document
2+
3+
## Overview
4+
5+
This design extends the existing `bin/checksum.rs` tool with benchmark functionality through a new `-b` flag. The benchmark mode will measure CRC performance using either user-provided data (files/strings) or randomly generated data, reporting throughput in GiB/s along with the acceleration target used.
6+
7+
The design maintains backward compatibility while adding a clean benchmark interface that leverages existing patterns from the `benches/benchmark.rs` implementation.
8+
9+
## Architecture
10+
11+
### Command Line Interface
12+
13+
The tool will extend the existing argument parsing to support:
14+
- `-b`: Enable benchmark mode
15+
- `--size <bytes>`: Specify data size for random generation (when no file/string provided)
16+
- `--duration <seconds>`: Benchmark duration as floating-point seconds (default: 10.0)
17+
- Existing `-a <algorithm>`: CRC algorithm (required in benchmark mode)
18+
- Existing `-f <file>` or `-s <string>`: Optional data source for benchmarking
19+
20+
### Data Flow
21+
22+
```
23+
User Input → Argument Parsing → Mode Detection → Benchmark Execution → Results Display
24+
25+
[Normal Checksum Mode]
26+
27+
[Existing Functionality]
28+
```
29+
30+
In benchmark mode:
31+
1. Parse and validate benchmark parameters
32+
2. Determine data source (file, string, or generated)
33+
3. For string/generated data: Load/generate test data once; For file data: use file path directly
34+
4. Run benchmark loop for specified duration using appropriate checksum function
35+
5. Calculate and display results
36+
37+
## Components and Interfaces
38+
39+
### Enhanced Config Structure
40+
41+
```rust
42+
#[derive(Debug)]
43+
struct Config {
44+
algorithm: String,
45+
file: Option<String>,
46+
string: Option<String>,
47+
format: OutputFormat,
48+
benchmark: Option<BenchmarkConfig>,
49+
}
50+
51+
#[derive(Debug)]
52+
struct BenchmarkConfig {
53+
size: Option<usize>,
54+
duration: f64,
55+
}
56+
```
57+
58+
### Benchmark Execution Module
59+
60+
```rust
61+
enum BenchmarkData {
62+
InMemory(Vec<u8>),
63+
File(String),
64+
}
65+
66+
struct BenchmarkRunner {
67+
algorithm: CrcAlgorithm,
68+
data: BenchmarkData,
69+
duration: f64,
70+
}
71+
72+
impl BenchmarkRunner {
73+
fn new(algorithm: CrcAlgorithm, data: BenchmarkData, duration: f64) -> Self
74+
fn run(&self) -> BenchmarkResult
75+
}
76+
77+
struct BenchmarkResult {
78+
iterations: u64,
79+
elapsed_seconds: f64,
80+
throughput_gibs: f64,
81+
time_per_iteration_nanos: f64,
82+
acceleration_target: String,
83+
data_size: u64,
84+
}
85+
```
86+
87+
### Data Generation
88+
89+
The benchmark will reuse the random data generation pattern from `benches/benchmark.rs`:
90+
91+
```rust
92+
fn generate_random_data(size: usize) -> Vec<u8> {
93+
let mut rng = rand::rng();
94+
let mut buf = vec![0u8; size];
95+
rng.fill_bytes(&mut buf);
96+
buf
97+
}
98+
```
99+
100+
## Data Models
101+
102+
### Input Data Sources
103+
104+
1. **File Input**: Use `checksum_file()` function to benchmark the entire file I/O and checksum stack
105+
2. **String Input**: Use string bytes directly with in-memory `checksum()` function
106+
3. **Generated Data**: Create random data of specified size using `rand::RngCore::fill_bytes()` and use in-memory `checksum()` function
107+
108+
### Benchmark Metrics
109+
110+
- **Iterations**: Number of checksum calculations performed
111+
- **Elapsed Time**: Actual benchmark duration in seconds
112+
- **Throughput**: Calculated as `(data_size * iterations) / elapsed_time / (1024^3)` GiB/s
113+
- **Acceleration Target**: Result from `crc_fast::get_calculator_target(algorithm)`
114+
115+
## Error Handling
116+
117+
### Validation Errors
118+
119+
- Invalid algorithm names (reuse existing validation)
120+
- Invalid size parameters (non-positive values)
121+
- Invalid duration parameters (non-positive values)
122+
- File read errors (reuse existing error handling)
123+
124+
### Runtime Errors
125+
126+
- Memory allocation failures for large data sizes
127+
- Timer precision issues (fallback to alternative timing methods)
128+
129+
### Error Messages
130+
131+
All errors will follow the existing pattern of displaying the error message followed by usage information.
132+
133+
## Testing Strategy
134+
135+
### Unit Tests
136+
137+
- Argument parsing validation for benchmark flags
138+
- BenchmarkConfig creation and validation
139+
- Data generation with various sizes
140+
- Throughput calculation accuracy
141+
142+
### Integration Tests
143+
144+
- End-to-end benchmark execution with different algorithms
145+
- File and string input handling in benchmark mode
146+
- Error handling for invalid parameters
147+
- Backward compatibility verification
148+
149+
### Performance Validation
150+
151+
- Verify benchmark results are reasonable (within expected ranges)
152+
- Compare with existing `benches/benchmark.rs` results for consistency
153+
- Test with various data sizes to ensure linear scaling
154+
155+
## Implementation Notes
156+
157+
### Timing Mechanism
158+
159+
Use `std::time::Instant` for high-precision timing, with different approaches for different data sources:
160+
161+
```rust
162+
let start = std::time::Instant::now();
163+
let mut iterations = 0u64;
164+
165+
while start.elapsed().as_secs_f64() < duration {
166+
match &self.data {
167+
BenchmarkData::InMemory(data) => {
168+
std::hint::black_box(checksum(algorithm, data));
169+
}
170+
BenchmarkData::File(filename) => {
171+
std::hint::black_box(checksum_file(algorithm, filename, None).unwrap());
172+
}
173+
}
174+
iterations += 1;
175+
}
176+
177+
let elapsed = start.elapsed().as_secs_f64();
178+
```
179+
180+
### Memory Considerations
181+
182+
- Pre-allocate test data once before benchmark loop
183+
- Use `std::hint::black_box()` to prevent compiler optimizations
184+
- Consider memory alignment for optimal performance (optional enhancement)
185+
186+
### Output Format
187+
188+
```
189+
Algorithm: CRC-32/ISCSI
190+
Acceleration Target: aarch64-neon-sha3
191+
Data Size: 1,048,576 bytes (1.0 MiB)
192+
Duration: 10.00 seconds
193+
Iterations: 12,345
194+
Throughput: 45.67 GiB/s
195+
Time per iteration: 810.2 μs
196+
```
197+
198+
### Default Values
199+
200+
- **Size**: 1,048,576 bytes (1 MiB)
201+
- **Duration**: 10.0 seconds
202+
- **Algorithm**: Must be specified via `-a` flag (no default)
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Requirements Document
2+
3+
## Introduction
4+
5+
This feature adds a simple benchmark option to the existing `bin/checksum.rs` tool via a command-line flag. The benchmark will allow users to test performance across different platforms using a single binary, reporting throughput in GiB/s and the acceleration target used. This enables cross-platform performance comparison without requiring a full development environment checkout.
6+
7+
## Glossary
8+
9+
- **Checksum_Tool**: The existing `bin/checksum.rs` binary application
10+
- **Benchmark_Mode**: A new operational mode that measures and reports performance metrics
11+
- **Acceleration_Target**: The hardware-specific optimization path returned by `get_calculator_target()`
12+
- **Throughput_Metric**: Performance measurement expressed in GiB/s (gibibytes per second)
13+
- **Test_Data**: Randomly generated byte array used for benchmark measurements
14+
- **Black_Box**: Rust's `std::hint::black_box()` function that prevents compiler optimizations during benchmarking
15+
16+
## Requirements
17+
18+
### Requirement 1
19+
20+
**User Story:** As a developer, I want to run performance benchmarks from the checksum tool, so that I can compare CRC performance across different hardware platforms without setting up a full development environment.
21+
22+
#### Acceptance Criteria
23+
24+
1. WHEN the user provides a `-b` flag, THE Checksum_Tool SHALL enter Benchmark_Mode
25+
2. WHILE in Benchmark_Mode, THE Checksum_Tool SHALL generate Test_Data of the specified size once and reuse it for all iterations
26+
3. THE Checksum_Tool SHALL report Throughput_Metric in GiB/s format
27+
4. THE Checksum_Tool SHALL display the Acceleration_Target used for the benchmark
28+
5. THE Checksum_Tool SHALL use Black_Box to prevent compiler optimizations during measurement
29+
30+
### Requirement 2
31+
32+
**User Story:** As a developer, I want to specify benchmark parameters, so that I can control the test conditions for consistent cross-platform comparisons.
33+
34+
#### Acceptance Criteria
35+
36+
1. WHEN the user provides `-a` parameter with `-b` flag, THE Checksum_Tool SHALL use the specified CRC algorithm for benchmarking
37+
2. WHEN the user provides `--size` parameter, THE Checksum_Tool SHALL generate Test_Data of the specified byte size
38+
3. WHEN the user provides `--duration` parameter, THE Checksum_Tool SHALL run the benchmark for the specified number of seconds
39+
4. WHERE no benchmark parameters are provided, THE Checksum_Tool SHALL use default values of 1 MiB for size and 10 seconds for duration
40+
5. THE Checksum_Tool SHALL validate all benchmark parameter values before starting the benchmark
41+
42+
### Requirement 3
43+
44+
**User Story:** As a developer, I want the benchmark to support both file/string input and generated data, so that I can benchmark with specific data or use random data for consistent testing.
45+
46+
#### Acceptance Criteria
47+
48+
1. WHEN the user provides `-b` with `-f` or `-s` flags, THE Checksum_Tool SHALL use the file or string content as Test_Data for benchmarking
49+
2. WHEN the user provides `-b` with `--size` parameter but no `-f` or `-s` flags, THE Checksum_Tool SHALL generate random Test_Data of the specified size
50+
3. IF the user provides `-b` without any data source or size specification, THEN THE Checksum_Tool SHALL generate random Test_Data using the default size
51+
4. THE Checksum_Tool SHALL display appropriate usage information when benchmark parameters are invalid
52+
5. THE Checksum_Tool SHALL maintain backward compatibility with existing checksum functionality
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Implementation Plan
2+
3+
- [x] 1. Extend command line argument parsing for benchmark options
4+
- Add `-b` flag to enable benchmark mode in the argument parser
5+
- Add `--size` parameter for specifying random data size
6+
- Add `--duration` parameter for benchmark duration (floating-point seconds)
7+
- Update the `Config` struct to include optional `BenchmarkConfig`
8+
- Update usage/help text to include new benchmark options
9+
- _Requirements: 1.1, 2.1, 2.2, 2.3, 2.4_
10+
11+
- [x] 2. Implement benchmark data structures and validation
12+
- Create `BenchmarkConfig` struct with size and duration fields
13+
- Create `BenchmarkData` enum to handle in-memory vs file data sources
14+
- Create `BenchmarkRunner` struct with algorithm, data, and duration
15+
- Create `BenchmarkResult` struct with all metrics including time per iteration
16+
- Add validation logic for benchmark parameters (positive values)
17+
- _Requirements: 2.5, 3.4_
18+
19+
- [x] 3. Implement benchmark execution logic
20+
- Create benchmark runner with timing loop using `std::time::Instant`
21+
- Implement separate execution paths for in-memory data vs file data
22+
- Use `std::hint::black_box()` to prevent compiler optimizations
23+
- Calculate throughput in GiB/s and time per iteration with appropriate units
24+
- Integrate `get_calculator_target()` for acceleration target reporting
25+
- _Requirements: 1.2, 1.3, 1.4, 1.5_
26+
27+
- [x] 4. Implement data source handling
28+
- Add random data generation function using `rand::RngCore::fill_bytes()`
29+
- Implement logic to determine data source (file, string, or generated)
30+
- Handle file size detection for throughput calculations
31+
- Create `BenchmarkData` instances based on user input
32+
- _Requirements: 3.1, 3.2, 3.3_
33+
34+
- [x] 5. Integrate benchmark mode into main application flow
35+
- Modify main function to detect benchmark mode and route accordingly
36+
- Ensure mutual exclusivity validation between benchmark and normal modes
37+
- Add benchmark result formatting and display
38+
- Update error handling to include benchmark-specific errors
39+
- Maintain backward compatibility with existing functionality
40+
- _Requirements: 3.4, 3.5_
41+
42+
- [x] 6. Add comprehensive testing for benchmark functionality
43+
- Write unit tests for argument parsing with benchmark flags
44+
- Test benchmark parameter validation (invalid sizes, durations)
45+
- Test data source selection logic (file vs string vs generated)
46+
- Test benchmark execution with different algorithms
47+
- Verify throughput calculation accuracy
48+
- Test error handling for invalid benchmark configurations
49+
- _Requirements: All requirements_

0 commit comments

Comments
 (0)