Skip to content

[QST] SQL Performance issue #2783

@eyalhir74

Description

@eyalhir74

I'm running a TPCH sql query (see at the bottom) against a parquet file (100GB db)
If I understand the first screenshot correctly, than half the time is decoding the parquet data on the GPU and other half of the time, the GPU sits idle (with "local[*]" configuration)
Second screenshot is the compute itself (? not sure).. where the compute takes 140ms on the GPU and then the GPU is idle for ~500ms.

What can be the reason for such low utilization?
Is there a way to instrument the code (cudf/blazingsql/...) with custom cupti markers to pin-point what causes the idle time?

Screenshot from 2021-06-22 21-54-26

Screenshot from 2021-06-22 22-00-39

select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey group by s_name order by numwait desc, s_name;

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions