Skip to content

Allow for scheduler file and existing dask cluster when using pdsh#21024

Merged
rapids-bot[bot] merged 15 commits intorapidsai:mainfrom
quasiben:pdsh-with-cluster
Jan 20, 2026
Merged

Allow for scheduler file and existing dask cluster when using pdsh#21024
rapids-bot[bot] merged 15 commits intorapidsai:mainfrom
quasiben:pdsh-with-cluster

Conversation

@quasiben
Copy link
Member

@quasiben quasiben requested a review from a team as a code owner January 13, 2026 01:20
@quasiben quasiben requested review from mroeschke and vyasr January 13, 2026 01:20
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Jan 13, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python Jan 13, 2026
Co-authored-by: Lawrence Mitchell <wence@gmx.li>
@rjzamora rjzamora added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed Python Affects Python cuDF API. labels Jan 13, 2026
@github-actions github-actions bot added the Python Affects Python cuDF API. label Jan 13, 2026
Copy link
Member

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change makes sense.

In follow-up work, I'd like to move the more worker-setup logic (e.g. rmm resource creation) into rapidsmpf. For example, something like rapidsai/rapidsmpf#779 will allow us to setup the rmm mr at bootstrapping time.

@rjzamora rjzamora requested a review from wence- January 13, 2026 15:10
@rjzamora rjzamora changed the title [WIP] Allow for scheduler file and existing dask cluster when using pdsh Allow for scheduler file and existing dask cluster when using pdsh Jan 13, 2026
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small cleanup suggestions

@rjzamora rjzamora added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Jan 13, 2026
@wence-
Copy link
Contributor

wence- commented Jan 14, 2026

/ok to test 7fa5a2c

@wence-
Copy link
Contributor

wence- commented Jan 14, 2026

/merge

@rjzamora
Copy link
Member

/ok to test 9c83470

if scheduler_address is not None:
# Connect to existing cluster via scheduler address
client = Client(address=scheduler_address)
n_workers = len(client.scheduler_info().get("workers", {}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: we serialize run_config.n_workers in the JSON output. When a scheduler file is provided the run_config.n_workers won't be accurate.

I wonder if we can mutate run_config.n_workers here? It's not ideal, but I think it's an OK tradeoff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in c8933c3

@rjzamora
Copy link
Member

/ok to test f422439

@quasiben
Copy link
Member Author

Merged with main which should hopefully resolve [doc errors were seeing](https://github.com/rapidsai/cudf/actions/runs/21073573023/job/60614010671?pr=21024#step:13:5132

@rjzamora
Copy link
Member

/ok to test 7481a1e

@quasiben
Copy link
Member Author

I think this is failing now because 26.4 packages aren't built yet:
https://github.com/rapidsai/cudf/actions/runs/21077373670/job/60622819769?pr=21024

@pentschev
Copy link
Member

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 20, 2026

/ok to test

@pentschev, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@pentschev
Copy link
Member

/ok to test fb91090

@rapids-bot rapids-bot bot merged commit d63d978 into rapidsai:main Jan 20, 2026
143 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Jan 20, 2026
@quasiben quasiben deleted the pdsh-with-cluster branch January 20, 2026 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to Merge Testing and reviews complete, ready to merge cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants