Skip to content

WIP: Adaptive selection of join algorithm#818

Draft
wence- wants to merge 14 commits intorapidsai:mainfrom
wence-:wence/fea/adaptive-join
Draft

WIP: Adaptive selection of join algorithm#818
wence- wants to merge 14 commits intorapidsai:mainfrom
wence-:wence/fea/adaptive-join

Conversation

@wence-
Copy link
Contributor

@wence- wence- commented Jan 23, 2026

No description provided.

wence- added 14 commits January 23, 2026 17:51
Sample a small number of chunks from each side to estimate input size.

The goal is to make an early join-strategy choice without stalling
the pipeline or buffering whole inputs. This adds an allgather so
the estimate reflects all ranks while keeping the local sample size
minimal.
Exploring how to manage approximate channel message counts in an actor
network.
We need to break the outer loop, not the inner one.
Having buffered some messages to inspect for size estimation we need to
replay the full channel to interact with the broadcast and shuffle nodes.
We do this by creating an output channel and sending in first the buffered
messages and then consuming the remainder of the output channel.
Using the new channel replay, and with statistics gathered from the
buffered messages, dispatch to either a broadcast join or a shuffle join
depending on the estimated sizes of the two tables.
Use broadcast side to drive output column ordering and
build/probe carrier selection via filtered views. Replace
the duplicate join chunk helpers and update callers.
To enable adaptive algorithms in query operator nodes, we will often need
some kind of metadata about the messages in a channel. Minimally we
typically need the number of expected messages that are being sent.

Rather than carrying a separate object around for metadata, give each
Channel a metadata queue that can be pushed into by producers. We use a
queue so that we can push in multiple messages without suspending on the
produce side.
@wence- wence- force-pushed the wence/fea/adaptive-join branch from 72438b4 to 76bdb97 Compare January 23, 2026 17:51
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 23, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rapidsai rapidsai deleted a comment from copy-pr-bot bot Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant