In separate directory clone this repository.
This script runs benchmarks LU, EP in both CUDA and CUDA Graphs versions and collects results.
chmod +x run_all.sh
./run_all.shResults, including timing and profiling data will be saved in the
$(pwd)/profiling_outputs/directory.
Profiling data itself will be saved to the$(pwd)/nsys_reports/.
Compare total execution time and number of kernel launches for each benchmark(LU, EP) between CUDA and CUDA Graphs.
You can extract this information from the .txt output or nsys stats.
Which implementation is faster?
Which one has fewer kernel lauches?
Are the performance differences consistent accross benchmarks?
For more details about the implementation, benchmark evolution, and task descriptions, visit the original NPB-GPU repository.