-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
The benchmark README (bench/spline_many/README.md) references historical single runs but lacks current, verified performance numbers.
Current GPU Performance (RTX 5060 Ti, nvfortran 25.11)
| Spline | CPU (pts/s) | GPU OpenACC (pts/s) | Speedup |
|---|---|---|---|
| 1D | 48.7M | 807M | ~16× |
| 2D | 8.4M | 539M | ~64× |
| 3D | 1.4M | 102M | ~73× |
Detailed Results
1D Benchmark (order=5, num_points=2048, nq=8, npts=2M, niter=20, periodic=T):
- CPU: 48.7M pts/s
- OpenACC GPU: 807M pts/s
2D Benchmark (order=[5,5], num_points=[256,256], nq=8, npts=500K, niter=10, periodic=[T,T]):
- CPU: 8.4M pts/s
- OpenACC GPU: 539M pts/s
3D Benchmark (order=[5,5,5], num_points=[48,32,32], nq=8, npts=200K, niter=6, periodic=[T,T,T]):
- CPU: 1.4M pts/s
- OpenACC GPU: 102M pts/s
Proposed Changes
- Add a "Performance Results" section with tabular data
- Document test environment (GPU model, driver version, compiler version)
- Note that nvfortran is the recommended compiler for GPU acceleration
- Document GCC16 GPU status (currently has runtime issues - see Fix GCC16 OpenACC GPU offloading runtime error #209)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels