Streaming Q4 implementation by TomAugspurger · Pull Request #710 · rapidsai/rapidsmpf

TomAugspurger · 2025-12-04T13:30:45Z

This implements TPCH query 4 using rapidsmpf.

The primary notable thing about q4 is the left semi join / where exists between the two tables:

q = (
    # SQL exists translates to semi join in Polars API
    orders.join(
        (lineitem.filter(pl.col("l_commitdate") < pl.col("l_receiptdate"))),
        left_on="o_orderkey",
        right_on="l_orderkey",
        how="semi",
    )
    ...
)

I want to think a bit more about how to do this. Right now, I've implemented a version that broadcasts the smaller orders table and shuffles the larger lineitem table. We're currently unable to reuse the filtered joiner across chunks. cudf only supports probing with the left table (the smaller one in our case, orders). The build table must be the right table (the larger one in our case, lineitem). So I think we have to shuffle lineitem otherwise we'll get incorrect results (a key in orders matching against a key in lineitem multiple times, once per partition).

copy-pr-bot · 2025-12-04T13:30:49Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cpp/benchmarks/streaming/ndsh/join.cpp

cpp/benchmarks/streaming/ndsh/q04.cpp

cpp/benchmarks/streaming/ndsh/join.cpp

TomAugspurger · 2025-12-11T15:59:18Z

a338046 uses cudf::ast::column_name_reference instead of column_reference to avoid including o_orderdate in the output (which isn't used outside of the filter, which is done in the read_parquet now).

Only lightly tested

wence-

A few more comments, I think this looks pretty good now!

wence- · 2026-01-15T11:50:33Z

cpp/benchmarks/streaming/ndsh/join.cpp

+    streaming::TableChunk const& left_chunk,
+    streaming::TableChunk&& right_chunk,


Why do you take left_chunk by const ref, but right_chunk by rvalue reference (i.e. caller must move it).

I'm not really sure (I'm reading up on the semantics of the two now).

We use this streaming:TableChunk&& type for the other functions (inner_join_chunk) so I suspect I was trying to match that. But when doing a broadcast left semi join, the left_chunk is reused many times, one per chunk, which I think means it can't be moved.

cpp/benchmarks/streaming/ndsh/join.cpp

cpp/benchmarks/streaming/ndsh/q04.cpp

cpp/include/rapidsmpf/communicator/mpi.hpp

TomAugspurger · 2026-01-30T16:53:32Z

I included some changes to our cpp/scripts/ndsh.py to make testing this easier

python cpp/scripts/ndsh.py run-and-validate --input-dir scale-1 --output-dir validation  --benchmark-dir "cpp/build/benchmarks/ndsh/" --benchmark-args='--no-pinned-host-memory' --reuse-expected -q 4

quasiben mentioned this pull request Dec 4, 2025

Manually Construct Multi-GPU C++ TPC-H Queries #693

Open

8 tasks

TomAugspurger added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Dec 4, 2025

TomAugspurger commented Dec 5, 2025

View reviewed changes

cpp/benchmarks/streaming/ndsh/join.cpp Show resolved Hide resolved

TomAugspurger force-pushed the tom/streaming-q4 branch from f132b4f to 0dcbbe9 Compare December 5, 2025 14:47

TomAugspurger marked this pull request as ready for review December 5, 2025 14:47

TomAugspurger requested review from a team as code owners December 5, 2025 14:47

TomAugspurger changed the title ~~WIP: Streaming Q4 implementation~~ Streaming Q4 implementation Dec 5, 2025

wence- reviewed Dec 8, 2025

View reviewed changes

This was referenced Dec 9, 2025

Q17 #656

Open

TPCH-derived Q18 #705

Open

wence- added 16 commits December 11, 2025 20:12

TPCH-derived Q3

8671c48

Only lightly tested

Q1

c7736d3

Parallel grouping

b4ef3fe

Dup the user's communicator when creating our MPI comm wrapper

2e1fdf2

Context creation and options parsing into utils

32dcff7

Use refactored context/argparse in q03

e24cfc3

And in q1

50a050a

Q9

f55ddef

Docstring for main

25e60de

Docstring

195e71a

Make broadcast public

012e9fb

Whack a load of stuff in

83e73b1

Fix some bugs

ed8890a

Use utils in q3 too

cec0ef0

TODO

a7c17da

WIP: bloom filter

96a4884

TomAugspurger added 7 commits December 17, 2025 06:42

Merge remote-tracking branch 'upstream/main' into tom/streaming-q4

8bcaed0

Note on why we shuffle

5152b12

Streams, events, joins

ea64499

Merge remote-tracking branch 'upstream/main' into tom/streaming-q4

49f78f9

Merge remote-tracking branch 'upstream/main' into tom/streaming-q4

a049351

lint

8098cd2

Merge remote-tracking branch 'upstream/main' into tom/streaming-q4

a75c762

wence- requested changes Jan 15, 2026

View reviewed changes

TomAugspurger added 21 commits January 29, 2026 07:40

Merge remote-tracking branch 'upstream/main' into tom/streaming-q4

041c3e8

Compiling again

6ff31f9

remove duplicate log

05782b3

remove unused event

4ae399c

fix while condition

3bea0df

fixes

672af5d

revert MPI change

9c8a21a

docstring fixes

7c16d67

Merge remote-tracking branch 'upstream/main' into tom/streaming-q4

2c29dc2

to_device compat

372851a

clarify hash-partitioning

202b7f7

add run-and-validate

206337c

reuse chunkwise_group_by

17ed006

fixed queries parsing

a9613e2

Handle dates

7a135e2

reuse chunkwise_sort_by

026c4ec

simplify

52b3934

simplify

c01a377

static casts

a2d46be

Use KeepKeys::NO

4ddc853

fix loop

ec8b45c

		streaming::TableChunk const& left_chunk,
		streaming::TableChunk&& right_chunk,

Conversation

TomAugspurger commented Dec 4, 2025

Uh oh!

copy-pr-bot bot commented Dec 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TomAugspurger commented Dec 11, 2025

Uh oh!

wence- left a comment

Choose a reason for hiding this comment

Uh oh!

wence- Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TomAugspurger commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TomAugspurger commented Jan 30, 2026 •

edited

Loading