Skip to content

Address RMM setup in bootstrap_dask_cluster#779

Open
rjzamora wants to merge 4 commits intorapidsai:mainfrom
rjzamora:rmm-setup
Open

Address RMM setup in bootstrap_dask_cluster#779
rjzamora wants to merge 4 commits intorapidsai:mainfrom
rjzamora:rmm-setup

Conversation

@rjzamora
Copy link
Member

Proposed Changes:

  • Adds a new rapidsmpf.integrations.core.setup_rmm_pool function to setup the RMM pool for a single "worker".
    • The logic is essentially copied from dask-cuda
    • Motivation: We do not want to rely on Dask/Dask-CUDA for this logic as we branch out to other deployment strategies.
  • Adds a call to the new setup_rmm_pool function in rapidsmpf.integrations.core.rmpf_worker_setup
  • Adds test coverage to rapidsmpf/python/rapidsmpf/rapidsmpf/tests/test_dask.py

@pentschev - Let me know what you think about "copying" the dask-cuda logic like this. Feel free to suggests something different.

@quasiben - Perhaps this makes multi-node deployment a bit easier? (i.e. we rely less on logic within LocalCUDACluster)

@rjzamora rjzamora self-assigned this Jan 12, 2026
@rjzamora rjzamora requested a review from a team as a code owner January 12, 2026 19:55
@rjzamora rjzamora added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jan 12, 2026
@pentschev
Copy link
Member

Copying is something I despise because they're going to diverge eventually and we'll have multiple places to update. However, I understand our alternatives are very limited:

  1. We need a separate repository to placee these functions (I know this isn't going to happen, so we can rule it out immediately); or
  2. Depend directly on Dask-CUDA.

Therefore, our only alternative to copying is really to depend on Dask-CUDA and I think we don't want to do that. Thus unfortunately I think we have no other options.

@pentschev
Copy link
Member

Therefore, our only alternative to copying is really to depend on Dask-CUDA and I think we don't want to do that.

Actually, we don't depend on Dask-CUDA because we don't want to make it a hard dependency. On the other hand, if we're running Dask it becomes a dependency that the user needs to satisfy, right? So I think we could import the function from dask_cuda, couldn't we? If we need to make adaptations in Dask-CUDA to satisfy some requirements from RapidsMPF that should be fine if that means we can leave the logic in a single place. Is there a reason why we couldn't do what I'm suggesting @rjzamora ?

@rjzamora
Copy link
Member Author

Thanks for the feedback @pentschev

My "general" goal here is just to add the necessary logic to configure an rmm resource on a pre-existing worker. We currently bootstrap a dask cluster in cudf-polars. However, we eventually need to support the case that Dask is not installed at all. The rmm-setup logic will need to live somewhere that does not depend on Dask at all.

@pentschev
Copy link
Member

My "general" goal here is just to add the necessary logic to configure an rmm resource on a pre-existing worker. We currently bootstrap a dask cluster in cudf-polars. However, we eventually need to support the case that Dask is not installed at all. The rmm-setup logic will need to live somewhere that does not depend on Dask at all.

That's fair, however, we're not gonna be setting up RMM from Python then, are we? I mean, generally we setup RMM directly in librapidsmpf (i.e., in the C++ layer), therefore that RMM setup logic would not reutilize the Python copied function, right? If I'm mistaken, what would the exact case for which we need RMM setup logic in the Python layer be?

@rjzamora
Copy link
Member Author

we're not gonna be setting up RMM from Python then, are we?

I'm currently thinking of cudf-polars deployment without Dask (e.g. with Ray). So, I DO think we would be setting up and/or reconfiguring our worker from Python.

@pentschev
Copy link
Member

we're not gonna be setting up RMM from Python then, are we?

I'm currently thinking of cudf-polars deployment without Dask (e.g. with Ray). So, I DO think we would be setting up and/or reconfiguring our worker from Python.

As discussed offline, I think the Ray use case is legit, in that case we won't have Dask-CUDA installed and may not be able to do everything directly in librapidsmpf. Therefore, let's go ahead with your approach and make a copy here, even though that's an unfortunate solution from a maintenance perspective.

"""
# Insert RMM resource adaptor on top of the current RMM resource stack.
# Set up RMM pool if configured (otherwise uses existing resource)
setup_rmm_pool(option_prefix, options, worker)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one thing that concerns me here, how do we handle the case where Dask-CUDA has already setup RMM? In Dask-CUDA, specifically the benchmarks, we have recently introduced a safeguard in https://github.com/rapidsai/dask-cuda/blob/669fbc76a1c29357e572646fb3f7f5cacc69f935/dask_cuda/benchmarks/utils.py#L549-L556, that checks whether a cluster has already setup RMM which happens if the cluster was setup externally, however, in Dask integration in RapidsMPF my understanding is that LocalCUDACluster will run the internal RMMSetup and then here we'll run it again, no? Setting it up twice may have downsides, particularly if memory has been already registered for example for use with CUDA IPC (e.g., via UCX). Ideally, we would prevent at all costs setting RMM up more than once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a good question.

If this PR were to be merged right now, I don't think the behavior of the cudf-polars pdsh benchmarks change. We would still be passing in the rmm_* arguments to LocalCUDACluster, and we would not be passing in the rmm_* Options to bootstrap_dask_cluster. Therefore, this call to setup_rmm_pool would be a no-op.

In a follow-up cudf-polars PR, we will need to move the rmm_* arguments from the LocalCUDACluster call to the bootstrap_dask_cluster call.

With that said, we still don't have a bullet-proof plan for detecting when RMM has already been configured on a rapidsmpf worker. We can only check for the RMMSetup plugin when we are running on top of Dask and didn't use setup_rmm_pool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this PR were to be merged right now, I don't think the behavior of the cudf-polars pdsh benchmarks change. We would still be passing in the rmm_* arguments to LocalCUDACluster, and we would not be passing in the rmm_* Options to bootstrap_dask_cluster. Therefore, this call to setup_rmm_pool would be a no-op.

In a follow-up cudf-polars PR, we will need to move the rmm_* arguments from the LocalCUDACluster call to the bootstrap_dask_cluster call.

I agree, this is probably fine for the single-node cluster. However, when setting up a multi-node cluster you need to provide an already setup cluster, in which case rmpf_worker_setup will run and setup RMM a second time, no?

With that said, we still don't have a bullet-proof plan for detecting when RMM has already been configured on a rapidsmpf worker. We can only check for the RMMSetup plugin when we are running on top of Dask and didn't use setup_rmm_pool.

I don't think we have a bullet-proof way to prevent it, and maybe never will have one, but perhaps we should implement a similar check for RMMSetup to that I linked in Dask-CUDA. I think when we run multi-node setup we will end up calling setup_rmm_pool on top of the already executed RMMSetup on the cluster, but correct me if I'm still overlooking something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when setting up a multi-node cluster you need to provide an already setup cluster, in which case rmpf_worker_setup will run and setup RMM a second time, no?

When you setup a multi-node cluster, you can pass in the --rmm-* options when you create your workers, or you can set up the rmm pool within rmpf_worker_setup. Either way, rmpf_worker_setup (and therefore setup_rmm_pool) will run on each worker when you call bootstrap_dask_cluster on the client.

The important detail is that setup_rmm_pool will not actually do anything if the Options argument is empty (or doesn't contain any rmm-related options). By default, all of these arguments will be None. The user (or utils.py script) needs to manually add the rmm-related options to the bootstrap_dask_cluster Options argument.

I think when we run multi-node setup we will end up calling setup_rmm_pool on top of the already executed RMMSetup on the cluster, but correct me if I'm still overlooking something.

Yes. However, we will not actually do anything in the setup_rmm_pool call unless we have updated the bootstrap_dask_cluster Options argument to contain rmm options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that trusts the user NOT to specify --rmm-* to both dask cuda worker and the client simultaneously, right? So while this is functional, it could still be dangerous. I don't want to hold off on this PR much longer, I just want to ensure we're aware we may be compromising on one end or another.

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm preemptively approving this PR after pointing out some pitfalls in https://github.com/rapidsai/rapidsmpf/pull/779/files#r2687445927, for example. I don't want to block this in case this becomes important, but it's important to keep those in mind that users can easily shoot themselves in the foot. I'm also happy to review if you want to try and address my suggestions trying to make this a bit less error prone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants