Skip to content

Investigate GPU Offloading #4

@huttered40

Description

@huttered40

Although algorithm (static) class templates should not care about where computation is performed (CPU or GPU), I think there are a few design choices that motivate parameterizing the algorithm itself instead of the matrix class. However, there are still reasons for parameterizing the matrix class, one of which is because polymorphic data containers (kokkos for example) do the same thing, and such data types should be able to plug into distributed-memory algorithms without any pain.

Think about the pros and cons here.

Three policy classes for offloading (just gemm for now) include:

  1. NoOffload (default)
  2. OffloadKeepDataResident (keep data on GPU as much as possible. Any communication of data on GPU is not a problem, but remember that it still must pass through PCI bus, exploit pinned memory via buffer allocated at the beginning of the program that is used repeatedly).
  3. OffloadTransferData (make not attempt to keep data resident on GPUs. Offload for each gemm invocation. Mainly a sanity-check policy class)
  • Don't forget to Incorporate into validate class templates as well.
  • Modify all test.cpp files to initially allocate memory on device.
  • Modify all Makefiles to use nvcc compiler and corresponding flags. Note that anything compiled with nvcc must be separate from anything compiled with MPI
  • Update blas directory to allow cuBlas headers.
  • Update bench/ files to include Offload policy class
  • Replace any syrk or trmm calls with gemm? Will that interfere with algorithm-specific policies (via non-orthogonal policy classes)?

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions