Skip to content

feat: Implement CUDA acceleration for Ricci observable#7

Open
JorenB wants to merge 1 commit intomainfrom
feat/ricci-cuda-observable
Open

feat: Implement CUDA acceleration for Ricci observable#7
JorenB wants to merge 1 commit intomainfrom
feat/ricci-cuda-observable

Conversation

@JorenB
Copy link
Owner

@JorenB JorenB commented May 26, 2025

This commit introduces a CUDA-based implementation for the average sphere distance calculation within the Ricci observable. The goal is to leverage GPU parallelism to speed up this computationally intensive part of the simulation.

Key changes include:

  1. Makefile Updates: Modified the Makefile to support NVCC for compiling .cu files. It now correctly handles CUDA source files, dependencies, and links against CUDA libraries.

  2. CUDA Kernels (observables/ricci_cuda_kernels.cu, .hpp):

    • Introduced pairwise_bfs_kernel, a CUDA kernel that computes
      distances between all pairs of vertices from two spheres (s1, s2).
    • Each pair's distance is found using a BFS, implemented in the
      __device__ function calculate_distance_bfs_device. This BFS
      is depth-limited to 3*epsilon and uses thread-local fixed-size
      arrays for its queue and visited set to manage resources.
    • A C++ wrapper function,
      RicciCUDATask::calculate_sum_and_count_distances_cuda,
      manages GPU memory allocation, data conversion (adjacency list to
      CSR format), H2D/D2H transfers, kernel launch, and cleanup.
  3. Ricci Observable Modification (observables/ricci.cpp):

    • The Ricci::averageSphereDistance method now calls the
      CUDA wrapper function instead of performing the BFS calculations
      on the CPU.
    • The original logic for sphere generation and the specific averaging
      formula (sum_distances / (epsilon * count_distances)) are
      preserved.
    • Added safety checks for empty data structures and zero epsilon.

This implementation aims to replace the previous CPU-bound calculation with a parallel GPU version. Further testing and validation in a compiled environment are needed to verify correctness, performance, and robustness, especially concerning the fixed-size limitations in the per-thread BFS.

This commit introduces a CUDA-based implementation for the average sphere
distance calculation within the Ricci observable. The goal is to leverage
GPU parallelism to speed up this computationally intensive part of the
simulation.

Key changes include:

1.  **Makefile Updates:** Modified the Makefile to support NVCC for
    compiling .cu files. It now correctly handles CUDA source files,
    dependencies, and links against CUDA libraries.

2.  **CUDA Kernels (`observables/ricci_cuda_kernels.cu`, `.hpp`):**
    *   Introduced `pairwise_bfs_kernel`, a CUDA kernel that computes
        distances between all pairs of vertices from two spheres (s1, s2).
    *   Each pair's distance is found using a BFS, implemented in the
        `__device__` function `calculate_distance_bfs_device`. This BFS
        is depth-limited to 3*epsilon and uses thread-local fixed-size
        arrays for its queue and visited set to manage resources.
    *   A C++ wrapper function,
        `RicciCUDATask::calculate_sum_and_count_distances_cuda`,
        manages GPU memory allocation, data conversion (adjacency list to
        CSR format), H2D/D2H transfers, kernel launch, and cleanup.

3.  **Ricci Observable Modification (`observables/ricci.cpp`):**
    *   The `Ricci::averageSphereDistance` method now calls the
        CUDA wrapper function instead of performing the BFS calculations
        on the CPU.
    *   The original logic for sphere generation and the specific averaging
        formula (`sum_distances / (epsilon * count_distances)`) are
        preserved.
    *   Added safety checks for empty data structures and zero epsilon.

This implementation aims to replace the previous CPU-bound calculation
with a parallel GPU version. Further testing and validation in a compiled
environment are needed to verify correctness, performance, and robustness,
especially concerning the fixed-size limitations in the per-thread BFS.
@JorenB JorenB self-assigned this May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant