Add a single GPU python implementation of TPCH query 3#629
Open
beckernick wants to merge 8 commits intorapidsai:mainfrom
Open
Add a single GPU python implementation of TPCH query 3#629beckernick wants to merge 8 commits intorapidsai:mainfrom
beckernick wants to merge 8 commits intorapidsai:mainfrom
Conversation
beckernick
commented
Nov 5, 2025
Comment on lines
+626
to
+628
| customer_x_orders, # columns 0, 1 from customer, columns 2, 3, 4, 5 from orders | ||
| filtered_lineitem, | ||
| customer_x_orders_x_lineitem, |
Member
Author
There was a problem hiding this comment.
Probably should flip these? But was hitting OOM issues.
beckernick
commented
Nov 5, 2025
| keep_keys: bool, | ||
| ) -> None: | ||
| left_tables: list[TableChunk] = [] | ||
| chunk_streams = [] |
Member
Author
There was a problem hiding this comment.
Blindly switched from set to list as noted in the q09 PR. Have not reasoned through if this matters.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements TPC-H query 3 in rapidsmpf Python (perhaps sub-optimally).
When I run this q3 implementation at SF1K (parquet, floats not decimals,
part0.parquettopartN.parquetpartitioned tables) on 1x H100 of an internal DGX H100 system, I get the following performance:This is 2x faster than my SF1K q3 run from yesterday with cuDF Polars + rapidsmpf machinery, but 4-5x slower than the most recent run by @TomAugspurger .
DuckDB on the same machine and dataset has the following performance:
The output matches DuckDB.
DuckDB SF1K q3 output:
rapidsmpf Python SF1K q3 output: