SLICOT C11 Benchmark Report
System Configuration
| Component |
Value |
| CPU |
Apple M1 Pro |
| RAM |
16 GB |
| OS |
Darwin 25.2.0 arm64 |
| Compiler |
Apple clang 17.0.0 |
| BLAS/LAPACK |
Accelerate.framework |
| Build |
debugoptimized (-O2 -g) |
Methodology
- Warmup: 3 iterations (cache priming)
- Timed runs: 10 iterations per benchmark
- Timer:
mach_absolute_time() (nanosecond resolution)
- Statistics: min, max, mean, stddev
SB02MD — Continuous-time Algebraic Riccati Equation Solver
Solves Q + A'X + XA - XGX = 0 using Laub's Schur vector method.
| Dataset |
N |
Mean (μs) |
Min |
Max |
σ |
Info |
| BB01103 |
4 |
9.66 |
9.58 |
10.04 |
0.14 |
✓ |
| BB01104 |
8 |
32.73 |
32.17 |
36.21 |
1.24 |
✓ |
| BB01105 |
9 |
20.37 |
20.00 |
21.38 |
0.49 |
✓ |
| BB01404 |
21 |
164.11 |
162.67 |
166.46 |
1.49 |
✓ |
| BB01106 |
30 |
210.43 |
208.92 |
216.00 |
2.17 |
✓ |
| BB02107 |
4 |
4.68 |
4.62 |
4.79 |
0.05 |
✓ |
| BB02108 |
4 |
6.53 |
6.46 |
6.67 |
0.06 |
✓ |
| BB02110 |
4 |
10.17 |
10.08 |
10.33 |
0.08 |
info=3 |
| BB02111 |
4 |
0.54 |
0.50 |
0.58 |
0.02 |
info=4 |
| BB02113 |
4 |
4.65 |
4.58 |
4.79 |
0.07 |
info=3 |
Notes:
info=3: Schur reordering failed (ill-conditioned problem)
info=4: Fewer than N stable eigenvalues (expected for some benchmark cases)
Scaling Analysis
n=4: ~10 μs
n=8: ~33 μs (3.3x for 2x n, expect 8x for O(n³))
n=9: ~20 μs
n=21: ~164 μs
n=30: ~210 μs (21x for 7.5x n, expect 422x for O(n³))
Observed scaling is sub-cubic — likely dominated by BLAS L3 efficiency on M1.
BB01AD — CAREX Benchmark Generator
Generates continuous-time algebraic Riccati equation test problems.
Group 1: Fixed-Size Examples (Literature Problems)
| Example |
N |
Mean (μs) |
Description |
Info |
| 1.1 |
2 |
0.15 |
Laub 1979, Ex.1 |
✓ |
| 1.2 |
2 |
0.16 |
Laub 1979, Ex.2 (uncontrollable) |
✓ |
| 1.3 |
— |
0.05 |
L-1011 aircraft model |
needs data |
| 1.4 |
— |
0.09 |
Binary distillation column |
needs data |
| 1.5 |
— |
0.12 |
Tubular ammonia reactor |
needs data |
| 1.6 |
— |
0.42 |
J-100 jet engine |
needs data |
Group 2: Parameter-Dependent Examples
| Example |
N |
Mean (μs) |
Description |
| 2.1 |
2 |
0.17 |
Arnold/Laub Ex.1 (stabilizability limit) |
| 2.2 |
2 |
0.35 |
Arnold/Laub Ex.3 (singular R) |
| 2.3 |
2 |
0.17 |
Kenney/Laub/Wette Ex.2 |
| 2.4 |
2 |
0.14 |
Bai/Qian (ill-conditioned H) |
| 2.5 |
2 |
0.16 |
H∞ problem |
| 2.6 |
3 |
0.77 |
Petkov (badly scaled) |
| 2.7 |
4 |
0.30 |
Magnetic tape control |
| 2.8 |
4 |
0.25 |
Arnold/Laub Ex.2 |
| 2.9 |
— |
1.21 |
Boeing B-767 flutter |
Group 3: Scalable Examples
| Example |
N |
Mean (μs) |
σ |
Description |
| 3.1 |
39 |
18.51 |
0.36 |
String of high-speed vehicles |
| 3.2 |
64 |
32.82 |
0.27 |
Circulant matrices |
BD01AD — CTDSX Descriptor System Generator
Generates continuous-time dynamical system benchmark examples.
Group 1 & 2: Fixed/Parameter-Dependent
| Example |
N |
Mean (μs) |
Info |
| 1.1 |
2 |
0.02 |
✓ |
| 1.2 |
2 |
0.02 |
✓ |
| 2.1 |
4 |
0.05 |
✓ |
| 2.2 |
4 |
0.05 |
✓ |
| 2.4 |
3 |
0.05 |
✓ |
Group 3: Scalable
| Example |
N |
Mean (μs) |
σ |
Throughput |
| 3.1 |
39 |
1.67 |
0.02 |
23.4 M elem/s |
| 3.2 |
100 |
9.10 |
0.03 |
11.0 M elem/s |
Performance Summary
| Routine |
Best Case |
Worst Case |
Typical |
| SB02MD (Riccati) |
4.7 μs (n=4) |
210 μs (n=30) |
~30 μs (n=8) |
| BB01AD (CAREX gen) |
0.15 μs (n=2) |
33 μs (n=64) |
~0.3 μs |
| BD01AD (CTDSX gen) |
0.01 μs (n=2) |
9.1 μs (n=100) |
~0.05 μs |
Throughput Estimates
For Riccati solver (2n × 2n Schur decomposition):
n=30: 210 μs → 4,762 solves/sec
Matrix ops: ~54,000 elements → 257 M elem/s
How to Run
# Build
meson setup build && ninja -C build
# Meson benchmark suite
meson test -C build --benchmark
# Individual runs
./build/benchmarks/bench_sb02md SLICOT-Reference/benchmark_data/BB01*.dat
./build/benchmarks/bench_bb01ad
./build/benchmarks/bench_bd01ad
# Python runner
python scripts/run_benchmarks.py
Observations
- Sub-cubic scaling: SB02MD shows better-than-expected scaling on M1, likely due to Accelerate's optimized BLAS L3 routines
- Generator routines are fast: BB01AD/BD01AD are dominated by setup cost, actual matrix generation is memory-bound
- Timer resolution: ~40ns minimum measurable on macOS, some BD01AD results show quantization
- Ill-conditioned cases: BB02110/111/113 correctly report numerical difficulties (info=3,4)
Benchmark infrastructure added in commit dd608f3.
SLICOT C11 Benchmark Report
System Configuration
Methodology
mach_absolute_time()(nanosecond resolution)SB02MD — Continuous-time Algebraic Riccati Equation Solver
Solves
Q + A'X + XA - XGX = 0using Laub's Schur vector method.Notes:
info=3: Schur reordering failed (ill-conditioned problem)info=4: Fewer than N stable eigenvalues (expected for some benchmark cases)Scaling Analysis
Observed scaling is sub-cubic — likely dominated by BLAS L3 efficiency on M1.
BB01AD — CAREX Benchmark Generator
Generates continuous-time algebraic Riccati equation test problems.
Group 1: Fixed-Size Examples (Literature Problems)
Group 2: Parameter-Dependent Examples
Group 3: Scalable Examples
BD01AD — CTDSX Descriptor System Generator
Generates continuous-time dynamical system benchmark examples.
Group 1 & 2: Fixed/Parameter-Dependent
Group 3: Scalable
Performance Summary
Throughput Estimates
For Riccati solver (2n × 2n Schur decomposition):
How to Run
Observations
Benchmark infrastructure added in commit dd608f3.