Address RMM setup in bootstrap_dask_cluster#779
Address RMM setup in bootstrap_dask_cluster#779rjzamora wants to merge 4 commits intorapidsai:mainfrom
bootstrap_dask_cluster#779Conversation
|
Copying is something I despise because they're going to diverge eventually and we'll have multiple places to update. However, I understand our alternatives are very limited:
Therefore, our only alternative to copying is really to depend on Dask-CUDA and I think we don't want to do that. Thus unfortunately I think we have no other options. |
Actually, we don't depend on Dask-CUDA because we don't want to make it a hard dependency. On the other hand, if we're running Dask it becomes a dependency that the user needs to satisfy, right? So I think we could import the function from |
|
Thanks for the feedback @pentschev My "general" goal here is just to add the necessary logic to configure an rmm resource on a pre-existing worker. We currently bootstrap a dask cluster in cudf-polars. However, we eventually need to support the case that Dask is not installed at all. The rmm-setup logic will need to live somewhere that does not depend on Dask at all. |
That's fair, however, we're not gonna be setting up RMM from Python then, are we? I mean, generally we setup RMM directly in |
I'm currently thinking of cudf-polars deployment without Dask (e.g. with Ray). So, I DO think we would be setting up and/or reconfiguring our worker from Python. |
As discussed offline, I think the Ray use case is legit, in that case we won't have Dask-CUDA installed and may not be able to do everything directly in |
| """ | ||
| # Insert RMM resource adaptor on top of the current RMM resource stack. | ||
| # Set up RMM pool if configured (otherwise uses existing resource) | ||
| setup_rmm_pool(option_prefix, options, worker) |
There was a problem hiding this comment.
There's one thing that concerns me here, how do we handle the case where Dask-CUDA has already setup RMM? In Dask-CUDA, specifically the benchmarks, we have recently introduced a safeguard in https://github.com/rapidsai/dask-cuda/blob/669fbc76a1c29357e572646fb3f7f5cacc69f935/dask_cuda/benchmarks/utils.py#L549-L556, that checks whether a cluster has already setup RMM which happens if the cluster was setup externally, however, in Dask integration in RapidsMPF my understanding is that LocalCUDACluster will run the internal RMMSetup and then here we'll run it again, no? Setting it up twice may have downsides, particularly if memory has been already registered for example for use with CUDA IPC (e.g., via UCX). Ideally, we would prevent at all costs setting RMM up more than once.
There was a problem hiding this comment.
Yeah, this is a good question.
If this PR were to be merged right now, I don't think the behavior of the cudf-polars pdsh benchmarks change. We would still be passing in the rmm_* arguments to LocalCUDACluster, and we would not be passing in the rmm_* Options to bootstrap_dask_cluster. Therefore, this call to setup_rmm_pool would be a no-op.
In a follow-up cudf-polars PR, we will need to move the rmm_* arguments from the LocalCUDACluster call to the bootstrap_dask_cluster call.
With that said, we still don't have a bullet-proof plan for detecting when RMM has already been configured on a rapidsmpf worker. We can only check for the RMMSetup plugin when we are running on top of Dask and didn't use setup_rmm_pool.
There was a problem hiding this comment.
If this PR were to be merged right now, I don't think the behavior of the cudf-polars pdsh benchmarks change. We would still be passing in the
rmm_*arguments toLocalCUDACluster, and we would not be passing in thermm_*Optionstobootstrap_dask_cluster. Therefore, this call tosetup_rmm_poolwould be a no-op.In a follow-up cudf-polars PR, we will need to move the
rmm_*arguments from theLocalCUDAClustercall to thebootstrap_dask_clustercall.
I agree, this is probably fine for the single-node cluster. However, when setting up a multi-node cluster you need to provide an already setup cluster, in which case rmpf_worker_setup will run and setup RMM a second time, no?
With that said, we still don't have a bullet-proof plan for detecting when RMM has already been configured on a rapidsmpf worker. We can only check for the
RMMSetupplugin when we are running on top of Dask and didn't usesetup_rmm_pool.
I don't think we have a bullet-proof way to prevent it, and maybe never will have one, but perhaps we should implement a similar check for RMMSetup to that I linked in Dask-CUDA. I think when we run multi-node setup we will end up calling setup_rmm_pool on top of the already executed RMMSetup on the cluster, but correct me if I'm still overlooking something.
There was a problem hiding this comment.
when setting up a multi-node cluster you need to provide an already setup cluster, in which case rmpf_worker_setup will run and setup RMM a second time, no?
When you setup a multi-node cluster, you can pass in the --rmm-* options when you create your workers, or you can set up the rmm pool within rmpf_worker_setup. Either way, rmpf_worker_setup (and therefore setup_rmm_pool) will run on each worker when you call bootstrap_dask_cluster on the client.
The important detail is that setup_rmm_pool will not actually do anything if the Options argument is empty (or doesn't contain any rmm-related options). By default, all of these arguments will be None. The user (or utils.py script) needs to manually add the rmm-related options to the bootstrap_dask_cluster Options argument.
I think when we run multi-node setup we will end up calling setup_rmm_pool on top of the already executed RMMSetup on the cluster, but correct me if I'm still overlooking something.
Yes. However, we will not actually do anything in the setup_rmm_pool call unless we have updated the bootstrap_dask_cluster Options argument to contain rmm options.
There was a problem hiding this comment.
But that trusts the user NOT to specify --rmm-* to both dask cuda worker and the client simultaneously, right? So while this is functional, it could still be dangerous. I don't want to hold off on this PR much longer, I just want to ensure we're aware we may be compromising on one end or another.
pentschev
left a comment
There was a problem hiding this comment.
I'm preemptively approving this PR after pointing out some pitfalls in https://github.com/rapidsai/rapidsmpf/pull/779/files#r2687445927, for example. I don't want to block this in case this becomes important, but it's important to keep those in mind that users can easily shoot themselves in the foot. I'm also happy to review if you want to try and address my suggestions trying to make this a bit less error prone.
Proposed Changes:
rapidsmpf.integrations.core.setup_rmm_poolfunction to setup the RMM pool for a single "worker".setup_rmm_poolfunction inrapidsmpf.integrations.core.rmpf_worker_setuprapidsmpf/python/rapidsmpf/rapidsmpf/tests/test_dask.py@pentschev - Let me know what you think about "copying" the dask-cuda logic like this. Feel free to suggests something different.
@quasiben - Perhaps this makes multi-node deployment a bit easier? (i.e. we rely less on logic within
LocalCUDACluster)