mulit-GPU and CUDA Stream Support by rubber-duck-debug · Pull Request #60 · lab-cosmo/mops

rubber-duck-debug · 2024-05-02T09:07:35Z

OPSA supported.

frostedoyster

This all looks good to me. Is there a way we can test if this is working (perhaps from torch?). I remember that it wasn't that easy with sphericart. Perhaps we can launch 10 small mops operations on 10 different CUDA streams and see if we get a speed-up. Would that make any sense @nickjbrowning?

frostedoyster · 2024-05-04T06:22:54Z

mops-torch/src/hpe.cpp

+#ifndef MOPS_CUDA_ENABLED
+        C10_THROW_ERROR(ValueError, "MOPS was not compiled with CUDA support " + A.device().str());
+#else
+        c10::cuda::CUDAGuard deviceGuard{A.device()};


What does this deviceGuard do? I see that it's not being used explicitly

it sets the current CUDA device to be the same one as A.device()

There's no easy way that I can see for us to test whether a kernel has launched on a specific stream from PyTorch. We can probably do this with the CUDA API but that seems a bit overkill.

frostedoyster · 2024-05-04T06:25:28Z

mops-torch/src/sap.cpp

+#ifdef MOPS_CUDA_ENABLED
+#include <c10/cuda/CUDAGuard.h>
+#include <c10/cuda/CUDAStream.h>
+#endif
+


I see that mops-torch/src/sap.cpp has not been modified past the headers (i.e. the stream is not actually taken into account), and the same is true for opsaw and sasaw. Is that correct?

I've fixed this for SAP. OPSAW and SASAW aren't implemented yet (SASAW is in a different branch) so when I get back to that I'll make it consistent.

…I errors

Prashanth Kanduri and others added 6 commits April 29, 2024 13:22

Add CI config files

72ce285

added stream management to OPSA

d463d21

x

d81ec9c

added support for other ops.

c8d8776

formatting.

0b06e79

Merge branch 'main' into multigpu_and_streams

de5a3da

rubber-duck-debug changed the title ~~WIP for mulit-GPU and CUDA Streams.~~ mulit-GPU and CUDA Stream Support May 2, 2024

rubber-duck-debug changed the title ~~mulit-GPU and CUDA Stream Support~~ WIP mulit-GPU and CUDA Stream Support May 2, 2024

rubber-duck-debug added the WIP work in progress label May 2, 2024

rubber-duck-debug and others added 4 commits May 3, 2024 13:57

fixed build issues.

b2c170a

fixed issue with opsaw instantiation

f78161c

formatting.

e89350b

add comments on the test job script

ea54e9a

rubber-duck-debug removed the WIP work in progress label May 3, 2024

rubber-duck-debug changed the title ~~WIP mulit-GPU and CUDA Stream Support~~ mulit-GPU and CUDA Stream Support May 3, 2024

rubber-duck-debug requested a review from frostedoyster May 3, 2024 13:31

Prashanth Kanduri and others added 3 commits May 3, 2024 15:32

insignificant commit for retriggering pipeline

1bb35aa

Merge branch 'main' into cscs_ci

497d811

Install tox

b9be358

frostedoyster reviewed May 4, 2024

View reviewed changes

frostedoyster mentioned this pull request May 4, 2024

CUDA OPSA VJP VJP #61

Open

Prashanth Kanduri and others added 9 commits May 6, 2024 13:29

attempt addressing CI compiler errors and warnings

d9fc283

Merge branch 'cscs_ci' of github.com:lab-cosmo/mops into cscs_ci

5b04a18

remove polynomial order zero case to avoid divide by zero issue

af2ec60

correct address of the atomic add operations in sap

9fee2a1

specify addresses with indices for atomic add operations to address C…

fc80766

…I errors

dummy implementation for polynomial order zero

2ed5771

removed bug in CUDA HPE.

62b73d8

header change

b6d4795

HPE divide by zero fix.

68223e2

rubber-duck-debug and others added 10 commits May 10, 2024 16:24

added in code for pre-sm60 atomicAdd(doubles)

3b4527f

macro to switch out the atomicAdds depending on ARCH

e3c70c2

documentation.

ade1cd3

changed macro to device function.

263984d

Merge branch 'cscs_ci' into multigpu_and_streams

e654289

fixed sap cstream

899ea94

formatting.

99cedf5

missing device guard in SAP

21b6235

missing stream.

dd29ac0

Add stream benchmark

7f92034

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mulit-GPU and CUDA Stream Support#60

mulit-GPU and CUDA Stream Support#60
rubber-duck-debug wants to merge 32 commits intomainfrom
multigpu_and_streams

rubber-duck-debug commented May 2, 2024

Uh oh!

frostedoyster left a comment

Uh oh!

frostedoyster May 4, 2024

Uh oh!

rubber-duck-debug May 10, 2024

Uh oh!

rubber-duck-debug May 10, 2024 •

edited

Loading

Uh oh!

frostedoyster May 4, 2024

Uh oh!

rubber-duck-debug May 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rubber-duck-debug commented May 2, 2024

Uh oh!

frostedoyster left a comment

Choose a reason for hiding this comment

Uh oh!

frostedoyster May 4, 2024

Choose a reason for hiding this comment

Uh oh!

rubber-duck-debug May 10, 2024

Choose a reason for hiding this comment

Uh oh!

rubber-duck-debug May 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frostedoyster May 4, 2024

Choose a reason for hiding this comment

Uh oh!

rubber-duck-debug May 10, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rubber-duck-debug May 10, 2024 •

edited

Loading