Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
f132b4f to
0dcbbe9
Compare
|
a338046 uses |
Only lightly tested
wence-
left a comment
There was a problem hiding this comment.
A few more comments, I think this looks pretty good now!
| streaming::TableChunk const& left_chunk, | ||
| streaming::TableChunk&& right_chunk, |
There was a problem hiding this comment.
Why do you take left_chunk by const ref, but right_chunk by rvalue reference (i.e. caller must move it).
There was a problem hiding this comment.
I'm not really sure (I'm reading up on the semantics of the two now).
We use this streaming:TableChunk&& type for the other functions (inner_join_chunk) so I suspect I was trying to match that. But when doing a broadcast left semi join, the left_chunk is reused many times, one per chunk, which I think means it can't be moved.
|
I included some changes to our |
This implements TPCH query 4 using rapidsmpf.
The primary notable thing about q4 is the left semi join /
where existsbetween the two tables:I want to think a bit more about how to do this. Right now, I've implemented a version that broadcasts the smaller
orderstable and shuffles the largerlineitemtable. We're currently unable to reuse the filtered joiner across chunks. cudf only supports probing with thelefttable (the smaller one in our case,orders). The build table must be therighttable (the larger one in our case,lineitem). So I think we have to shufflelineitemotherwise we'll get incorrect results (a key inordersmatching against a key inlineitemmultiple times, once per partition).