Conversation
frostedoyster
left a comment
There was a problem hiding this comment.
This all looks good to me. Is there a way we can test if this is working (perhaps from torch?). I remember that it wasn't that easy with sphericart. Perhaps we can launch 10 small mops operations on 10 different CUDA streams and see if we get a speed-up. Would that make any sense @nickjbrowning?
| #ifndef MOPS_CUDA_ENABLED | ||
| C10_THROW_ERROR(ValueError, "MOPS was not compiled with CUDA support " + A.device().str()); | ||
| #else | ||
| c10::cuda::CUDAGuard deviceGuard{A.device()}; |
There was a problem hiding this comment.
What does this deviceGuard do? I see that it's not being used explicitly
There was a problem hiding this comment.
it sets the current CUDA device to be the same one as A.device()
There was a problem hiding this comment.
There's no easy way that I can see for us to test whether a kernel has launched on a specific stream from PyTorch. We can probably do this with the CUDA API but that seems a bit overkill.
| #ifdef MOPS_CUDA_ENABLED | ||
| #include <c10/cuda/CUDAGuard.h> | ||
| #include <c10/cuda/CUDAStream.h> | ||
| #endif | ||
|
|
There was a problem hiding this comment.
I see that mops-torch/src/sap.cpp has not been modified past the headers (i.e. the stream is not actually taken into account), and the same is true for opsaw and sasaw. Is that correct?
There was a problem hiding this comment.
I've fixed this for SAP. OPSAW and SASAW aren't implemented yet (SASAW is in a different branch) so when I get back to that I'll make it consistent.
OPSA supported.