Q17 by quasiben · Pull Request #656 · rapidsai/rapidsmpf

quasiben · 2025-11-12T15:28:18Z

No description provided.

cpp/benchmarks/streaming/ndsh/CMakeLists.txt

TomAugspurger · 2025-12-09T20:26:12Z

cpp/benchmarks/streaming/ndsh/q17.cpp

+    );
+}
+
+rapidsmpf::streaming::Node filter_part(


I think that the filter can be done in the read_parquet with something similar to https://github.com/rapidsai/rapidsmpf/pull/710/changes#diff-9743b2e766c061cb2f29e446eb28ac8761389f601474ecfd11cac9971deb81f7R262-R343. I can try implementing that if you'd like, since I just did it for query 4.

Done in ed723dc

nirandaperera · 2025-12-10T06:26:02Z

cpp/benchmarks/streaming/ndsh/q17.cpp

+// Select specific columns from the input table
+rapidsmpf::streaming::Node select_columns(
+    std::shared_ptr<rapidsmpf::streaming::Context> ctx,
+    std::shared_ptr<rapidsmpf::streaming::Channel> ch_in,
+    std::shared_ptr<rapidsmpf::streaming::Channel> ch_out,
+    std::vector<cudf::size_type> indices
+) {


maybe this column selection should be a part of post processing in the read_parquet node. Having a node to this immediately after a read_parquet seems a bit redundant.

Maybe we could apply the selection here

rapidsmpf/cpp/src/streaming/cudf/parquet.cpp

Line 47 in d74a84c

cudf::io::read_parquet(options, stream, ctx->br()->device_mr()).tbl, stream

Then we might not need to do the column copy in L204

I was wondering about this too, but I wasn't sure whether cudf had a built-in way of saying "read these columns for the filter, but don't include them in the output". If it doesn't have a way of doing that, then I agree doing it after the read but before the return is probably worth doing.

@TomAugspurger I think the filter is for rows. It's more of a post-processing step in read_parquet node AFAIU.

Yep it's a row filter. But in some cases (like this one) we only need a column for the filter. If cudf doesn't have a way to include a column for the purpose of filtering, but exclude it from the result table, then we can update our wrapper to perform that selection.

You can do it in the read_parquet by using column_name_reference in the filter rather than column_reference.

nirandaperera · 2025-12-10T06:44:41Z

cpp/benchmarks/streaming/ndsh/q17.cpp

+        std::vector<std::unique_ptr<cudf::groupby_aggregation>> sum_aggs;
+        sum_aggs.push_back(cudf::make_sum_aggregation<cudf::groupby_aggregation>());
+        std::vector<std::unique_ptr<cudf::groupby_aggregation>> count_aggs;
+        count_aggs.push_back(cudf::make_count_aggregation<cudf::groupby_aggregation>());
+        requests.push_back(
+            cudf::groupby::aggregation_request(table.column(1), std::move(sum_aggs))
+        );
+        requests.push_back(
+            cudf::groupby::aggregation_request(table.column(1), std::move(count_aggs))
+        );


I think this could be simplified as follows, since both sum and count are happening on col 1.

Suggested change

std::vector<std::unique_ptr<cudf::groupby_aggregation>> sum_aggs;

sum_aggs.push_back(cudf::make_sum_aggregation<cudf::groupby_aggregation>());

std::vector<std::unique_ptr<cudf::groupby_aggregation>> count_aggs;

count_aggs.push_back(cudf::make_count_aggregation<cudf::groupby_aggregation>());

requests.push_back(

cudf::groupby::aggregation_request(table.column(1), std::move(sum_aggs))

);

requests.push_back(

cudf::groupby::aggregation_request(table.column(1), std::move(count_aggs))

);

std::vector<std::unique_ptr<cudf::groupby_aggregation>> aggs;

aggs.push_back(cudf::make_sum_aggregation<cudf::groupby_aggregation>());

aggs.push_back(cudf::make_count_aggregation<cudf::groupby_aggregation>());

requests.push_back(

cudf::groupby::aggregation_request(table.column(1), std::move(aggs))

);

then results will have both sum and count in a single vector

nirandaperera · 2025-12-10T06:52:01Z

cpp/benchmarks/streaming/ndsh/q17.cpp

+        std::vector<std::unique_ptr<cudf::column>> result;
+        result.push_back(std::move(keys->release()[0]));
+        result.push_back(std::move(results[0].results[0]));  // sum
+        result.push_back(std::move(results[1].results[0]));  // count


in Q09 we push back to the vector given by keys->release()

auto result = keys->release(); for (auto&& r : results) { std::ranges::move(r.results, std::back_inserter(result)); }

I think this is a neater way

nirandaperera · 2025-12-10T06:55:44Z

cpp/benchmarks/streaming/ndsh/q17.cpp

+        auto chunk_stream = chunk.stream();
+        auto table = chunk.table_view();
+
+        if (!table.is_empty() && table.num_columns() >= 4) {


can there be a table.num_columns() != 4 scenario?

nirandaperera · 2025-12-10T07:14:23Z

cpp/benchmarks/streaming/ndsh/q17.cpp

+        auto sum_scalar = cudf::make_numeric_scalar(
+            cudf::data_type(cudf::type_id::FLOAT64), chunk_stream, ctx->br()->device_mr()
+        );
+        static_cast<cudf::numeric_scalar<double>*>(sum_scalar.get())
+            ->set_value(local_sum, chunk_stream);
+
+        std::vector<std::unique_ptr<cudf::column>> result_cols;
+        result_cols.push_back(
+            cudf::make_column_from_scalar(
+                *sum_scalar, 1, chunk_stream, ctx->br()->device_mr()
+            )
+        );


I think we could use a rmm::device_uvector here. This will avoid an extra allocation IINM.

Suggested change

auto sum_scalar = cudf::make_numeric_scalar(

cudf::data_type(cudf::type_id::FLOAT64), chunk_stream, ctx->br()->device_mr()

);

static_cast<cudf::numeric_scalar<double>*>(sum_scalar.get())

->set_value(local_sum, chunk_stream);

std::vector<std::unique_ptr<cudf::column>> result_cols;

result_cols.push_back(

cudf::make_column_from_scalar(

*sum_scalar, 1, chunk_stream, ctx->br()->device_mr()

)

);

rmm::device_uvector<double> vec(1, chunk_stream, ctx->br()->device_mr());

vec.set_element_async(0, local_sum, chunk_stream);

std::vector<std::unique_ptr<cudf::column>> result_cols {

std::make_unique<cudf::column>(std::move(vec), {}, 0)

};

nirandaperera · 2025-12-10T07:17:14Z

cpp/benchmarks/streaming/ndsh/q17.cpp

+            auto sum_val =
+                static_cast<cudf::numeric_scalar<double>*>(
+                    cudf::reduce(
+                        local_result->view().column(0),
+                        *cudf::make_sum_aggregation<cudf::reduce_aggregation>(),
+                        cudf::data_type(cudf::type_id::FLOAT64),
+                        chunk_stream,
+                        ctx->br()->device_mr()
+                    )
+                        .get()
+                )
+                    ->value(chunk_stream);


this is the same as local_sum in a single rank case, isnt it? Seems a bit redundant

Also, I think we can remove the if (local_result) {...} else {...} branch and simply use the local_sum value here.

nirandaperera · 2025-12-10T07:19:33Z

cpp/benchmarks/streaming/ndsh/q17.cpp

+            auto avg_yearly_val = sum_val / 7.0;
+            auto avg_yearly_scalar = cudf::make_numeric_scalar(
+                cudf::data_type(cudf::type_id::FLOAT64),
+                chunk_stream,
+                ctx->br()->device_mr()
+            );
+            static_cast<cudf::numeric_scalar<double>*>(avg_yearly_scalar.get())
+                ->set_value(avg_yearly_val, chunk_stream);
+
+            std::vector<std::unique_ptr<cudf::column>> result_cols;
+            result_cols.push_back(
+                cudf::make_column_from_scalar(
+                    *avg_yearly_scalar, 1, chunk_stream, ctx->br()->device_mr()
+                )
+            );


maybe we need a util to create a column from a single value of type T

nirandaperera · 2025-12-10T07:21:34Z

cpp/benchmarks/streaming/ndsh/q17.cpp

+                auto avg_yearly_scalar = cudf::make_numeric_scalar(
+                    cudf::data_type(cudf::type_id::FLOAT64),
+                    chunk_stream,
+                    ctx->br()->device_mr()
+                );
+                static_cast<cudf::numeric_scalar<double>*>(avg_yearly_scalar.get())
+                    ->set_value(avg_yearly_val, chunk_stream);
+
+                std::vector<std::unique_ptr<cudf::column>> result_cols;
+                result_cols.push_back(
+                    cudf::make_column_from_scalar(
+                        *avg_yearly_scalar, 1, chunk_stream, ctx->br()->device_mr()
+                    )
+                );


move to a separate util

nirandaperera · 2025-12-10T07:38:26Z

cpp/benchmarks/streaming/ndsh/q17.cpp

+                    [](
+                        std::shared_ptr<rapidsmpf::streaming::Context> ctx,
+                        std::shared_ptr<rapidsmpf::streaming::Channel> ch_in,
+                        std::shared_ptr<rapidsmpf::streaming::Channel> ch_out,
+                        rapidsmpf::OpID tag
+                    ) -> rapidsmpf::streaming::Node {


let's move this to a separate method

Signed-off-by: niranda perera <niranda.perera@gmail.com>

Avoid including the filter columns in the output table.

TomAugspurger · 2025-12-11T16:24:04Z

@nirandaperera I pushed a couple changes

0c55117 has Lawrence's suggestion to use column_name_reference in the read_parquet filter. This lets us exclude those filter columns from the result table, so we can remove the projection node after the read_parquet
f513b47 fixes some compilation errors. You might want to take a close look.

I noticed that we're getting a different (presumably incorrect) result now. I'm not sure how long it's been like this. Running this script:

Details

/*

# rapidsmpf
./cpp/build/benchmarks/ndsh/q17 --input-directory /datasets/toaugspurger/tpch-rs/scale-10 --output-file /tmp/q17.parquet

# duckdb
duckdb < q17.sql

# validate
diff -y (uvx parquet-tools show /tmp/q17.parquet | psub) (uvx parquet-tools show duckdb-q17.parquet | psub)
*/
COPY (
    select
        round(sum(l_extendedprice) / 7.0, 2) as avg_yearly
    from
        read_parquet('/datasets/toaugspurger/tpch-rs/scale-10/lineitem/*.parquet') as lineitem,
        read_parquet('/datasets/toaugspurger/tpch-rs/scale-10/part/*.parquet') as part
    where
        p_partkey = l_partkey
        and p_brand = 'Brand#23'
        and p_container = 'MED BOX'
        and l_quantity < (
            select
                0.2 * avg(l_quantity)
            from
                read_parquet('/datasets/toaugspurger/tpch-rs/scale-10/lineitem/*.parquet') as lineitem
            where
                l_partkey = p_partkey
        )
)
TO 'duckdb-q17.parquet'
(FORMAT 'parquet')
;

shows the different result for avg_yearly:

❯ diff -y (uvx parquet-tools show /tmp/q17.parquet | psub) (uvx parquet-tools show duckdb-q17.parquet | psub)                                                                                                                                                                       (rapidsmpf-dev) 
+--------------+                                                +--------------+
|   avg_yearly |                                                |   avg_yearly |
|--------------|                                                |--------------|
|       223947 |                                              | |  3.29549e+06 |
+--------------+                                                +--------------+

TomAugspurger · 2025-12-11T19:31:39Z

I checked out an earlier commit (c21cb30) and confirmed that it gets the same (probably incorrect) 223947 as HEAD. So the recent changes didn't cause a regression there.

quasiben · 2025-12-11T19:33:23Z

Shoot! I was testing against SF3K and thought was getting the correct results. I'll double check

TomAugspurger · 2025-12-11T19:34:43Z

It's also possible I've messed up the expected result. I haven't looked carefully.

quasiben · 2025-12-12T03:33:09Z

@TomAugspurger and I went through a variety of scale factors: 10, 100, 1K, 3K and validated that all worked multi-gpu. However, with a single GPU run the values were incorrect. We suspect something my be incorrect with reading/distributing parquet files or assumption with distribution are only valid for N+1 ranks

quasiben · 2025-12-12T14:16:53Z

@TomAugspurger I think I resolved the issue in f67178f -- carved out a special case for single rank computing

Resolved conflict in cpp/benchmarks/streaming/ndsh/CMakeLists.txt by combining query lists from both branches (q01, q03, q09, q17).

wence- and others added 8 commits November 4, 2025 14:10

Add a python implementation of TPCH query 9

b971af7

Merge branch 'pull-request/627' of github.com:rapidsai/rapidsmpf

5f64828

Merge branch 'main' of github.com:rapidsai/rapidsmpf

060b3ba

Merge branch 'main' of github.com:rapidsai/rapidsmpf

ac1fb0d

Merge branch 'main' of github.com:rapidsai/rapidsmpf

e18cd4a

Merge branch 'main' of github.com:rapidsai/rapidsmpf

5dc2023

Merge branch 'main' of github.com:rapidsai/rapidsmpf

d2530de

Merge branch 'main' of github.com:rapidsai/rapidsmpf

4c8d732

quasiben requested review from a team as code owners November 12, 2025 15:28

quasiben added 7 commits November 19, 2025 10:20

Merge branch 'main' of github.com:rapidsai/rapidsmpf

002306a

Merge branch 'main' of github.com:rapidsai/rapidsmpf

8e98a4d

Merge branch 'main' of github.com:rapidsai/rapidsmpf

e4e2f2b

Merge branch 'main' of github.com:rapidsai/rapidsmpf

8ec928e

Merge branch 'main' of github.com:rapidsai/rapidsmpf

31cc73c

Merge branch 'main' of github.com:rapidsai/rapidsmpf

3027b3b

initial attempt at q17

ba0b070

quasiben force-pushed the q17 branch from 5d972d5 to ba0b070 Compare December 4, 2025 21:57

quasiben requested a review from a team as a code owner December 4, 2025 21:57

quasiben added 4 commits December 4, 2025 13:59

Merge branch 'main' of github.com:rapidsai/rapidsmpf into q17

97ff695

a few more cleanups

789574d

fix avg calc

9188c69

Merge branch 'main' of github.com:rapidsai/rapidsmpf into q17

50e01b6

quasiben changed the title ~~[WIP] Q17~~ Q17 Dec 6, 2025

quasiben added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Dec 6, 2025

quasiben and others added 2 commits December 6, 2025 11:11

remove q9 python implementations

0e89599

Merge branch 'main' into q17

c21cb30

TomAugspurger reviewed Dec 9, 2025

View reviewed changes

cmake

62cff0e

Push part filter into read_parquet

ed723dc

nirandaperera requested changes Dec 10, 2025

View reviewed changes

nirandaperera and others added 4 commits December 9, 2025 23:48

Merge branch 'main' into q17

88723ed

addressing my comments

587780c

Signed-off-by: niranda perera <niranda.perera@gmail.com>

Use column name references in AST expressions

0c55117

Avoid including the filter columns in the output table.

compile fixes

f513b47

quasiben mentioned this pull request Dec 11, 2025

Manually Construct Multi-GPU C++ TPC-H Queries #693

Open

8 tasks

fix calculation for single rank execution

f67178f

Merge branch 'main' into pull-request/656

f66de95

Resolved conflict in cpp/benchmarks/streaming/ndsh/CMakeLists.txt by combining query lists from both branches (q01, q03, q09, q17).

Conversation

quasiben commented Nov 12, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger commented Dec 11, 2025

Uh oh!

quasiben commented Dec 11, 2025

Uh oh!

TomAugspurger commented Dec 11, 2025

Uh oh!

quasiben commented Dec 12, 2025

Uh oh!

quasiben commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TomAugspurger commented Dec 11, 2025 •

edited

Loading