Skip to content

[WIP] Cache and reuse build-side hash tables during broadcast hash joins#14680

Draft
rishic3 wants to merge 12 commits intoNVIDIA:mainfrom
rishic3:hash-join-reuse-main
Draft

[WIP] Cache and reuse build-side hash tables during broadcast hash joins#14680
rishic3 wants to merge 12 commits intoNVIDIA:mainfrom
rishic3:hash-join-reuse-main

Conversation

@rishic3
Copy link
Copy Markdown
Contributor

@rishic3 rishic3 commented Apr 24, 2026

Fixes #12327, fixes #10413.

Description

WIP.

NDS Results

Results on NDS SF3K, mean of 4 measured runs.

Summary: Full run

Metric Value
Total baseline (sum of means) 316,197 ms
Total test (sum of means) 298,670 ms
Overall speedup (sum baseline / sum test) 1.059x
Geomean of per-query speedups 1.031x
Queries faster with feature (>1.00x) 54
Queries slower with feature (<1.00x) 49

Summary: BHJ Op Time

Metric Value
Queries with ≥1 BHJ 99
Total BHJ instances 615
Total baseline BHJ op time (sum) 644,052.9 ms
Total test BHJ op time (sum) 575,890.7 ms
Delta -68,162.2 ms (-10.58%)
Overall speedup (sum of baseline / sum of test) 1.12x
Geomean speedup across all BHJs 1.17x

Per Query

Per query runtime comparison

Sorted by Δ ascending (biggest wins first).

Query Baseline mean (ms) Test mean (ms) Δ (ms) Speedup
query16 10,162 2,214 -7,948 4.590x
query75 12,020 7,522 -4,498 1.598x
query23_part2 13,914 9,417 -4,497 1.478x
query64 11,418 10,661 -758 1.071x
query15 1,503 1,074 -430 1.400x
query81 2,630 2,248 -383 1.170x
query2 2,600 2,239 -362 1.161x
query27 1,812 1,459 -352 1.242x
query13 2,022 1,678 -345 1.206x
query24_part1 9,417 9,108 -308 1.034x
query69 1,622 1,332 -290 1.218x
query39_part1 2,554 2,274 -280 1.123x
query51 2,056 1,776 -280 1.157x
query31 2,242 1,984 -258 1.130x
query1 2,004 1,752 -252 1.144x
query28 7,492 7,242 -249 1.034x
query24_part2 9,017 8,803 -214 1.024x
query18 2,246 2,043 -203 1.099x
query57 1,914 1,730 -183 1.106x
query94 3,238 3,058 -180 1.059x
query36 1,812 1,655 -157 1.095x
query74 2,730 2,580 -151 1.058x
query83 1,060 910 -150 1.165x
query88 4,864 4,718 -146 1.031x
query5 2,914 2,784 -129 1.046x
query65 4,021 3,892 -129 1.033x
query33 1,338 1,212 -126 1.104x
query78 11,507 11,397 -110 1.010x
query58 1,104 1,000 -104 1.104x
query7 3,729 3,628 -102 1.028x
query26 1,212 1,116 -96 1.087x
query53 921 825 -96 1.116x
query32 1,131 1,046 -84 1.081x
query55 560 486 -75 1.154x
query44 1,020 948 -72 1.075x
query46 1,509 1,438 -72 1.050x
query47 2,350 2,281 -69 1.030x
query98 1,648 1,580 -68 1.043x
query19 1,275 1,213 -62 1.051x
query85 1,757 1,696 -60 1.036x
query77 1,276 1,216 -60 1.049x
query23_part1 8,582 8,527 -56 1.007x
query79 1,204 1,156 -48 1.041x
query84 1,157 1,124 -34 1.030x
query4 5,760 5,728 -32 1.006x
query82 1,016 989 -28 1.028x
query43 1,032 1,005 -27 1.027x
query86 1,294 1,272 -22 1.017x
query45 1,179 1,161 -18 1.016x
query3 540 522 -17 1.033x
query54 1,672 1,657 -15 1.009x
query60 1,344 1,330 -14 1.011x
query49 2,602 2,590 -12 1.005x
query12 741 734 -6 1.009x
query10 1,548 1,549 0 1.000x
query30 1,955 1,957 2 0.999x
query17 1,760 1,762 2 0.999x
query25 1,666 1,669 3 0.998x
query90 921 929 8 0.991x
query80 4,811 4,820 8 0.998x
query62 1,506 1,518 12 0.992x
query56 1,066 1,080 14 0.987x
query42 405 423 18 0.957x
query76 3,377 3,396 19 0.994x
query22 1,329 1,350 21 0.984x
query40 1,346 1,368 21 0.984x
query91 1,666 1,689 24 0.986x
query41 371 396 25 0.937x
query63 953 979 26 0.973x
query70 1,985 2,018 33 0.984x
query72 2,955 2,992 36 0.988x
query52 552 589 36 0.938x
query97 2,424 2,471 47 0.981x
query6 928 981 53 0.946x
query92 726 784 58 0.926x
query20 717 780 62 0.920x
query68 1,408 1,473 65 0.956x
query9 2,955 3,022 67 0.978x
query34 2,391 2,458 68 0.973x
query61 1,543 1,622 80 0.951x
query89 1,242 1,325 84 0.937x
query66 2,933 3,019 86 0.972x
query73 1,131 1,217 86 0.929x
query99 1,908 2,002 94 0.953x
query59 2,379 2,482 103 0.958x
query87 1,880 1,996 117 0.941x
query39_part2 1,712 1,832 121 0.934x
query29 3,292 3,425 132 0.961x
query21 735 877 142 0.838x
query67 13,548 13,707 159 0.988x
query8 926 1,087 161 0.852x
query48 1,154 1,352 198 0.854x
query71 3,716 3,964 247 0.938x
query35 1,670 1,935 265 0.863x
query50 9,404 9,717 313 0.968x
query11 3,226 3,658 431 0.882x
query38 2,002 2,454 452 0.816x
query96 7,469 7,932 463 0.942x
query14_part2 6,500 6,984 484 0.931x
query95 5,568 6,070 502 0.917x
query93 12,466 12,971 505 0.961x
query37 813 1,342 530 0.606x
query14_part1 7,513 8,219 706 0.914x

Per Query Broadcast Hash Join

Broadcast hash join op times, aggregated per query

Sorted by BHJ Δ ascending (biggest broadcast hash join savings first).

Query #BHJs BHJ base (ms) BHJ test (ms) BHJ Δ (ms) BHJ speedup Builds Hits Hit rate
query88 24 51,572.1 33,687.5 -17,884.6 1.53x 80 4,768 98.3%
query23_part1 8 64,163.8 60,428.2 -3,735.6 1.06x 32 570 94.7%
query14_part2 43 16,971.5 13,369.0 -3,602.5 1.27x 65 1,411 95.6%
query78 3 11,727.2 8,262.3 -3,464.9 1.42x 8 592 98.7%
query14_part1 63 16,209.5 13,179.0 -3,030.4 1.23x 56 1,689 96.8%
query94 3 9,339.1 6,458.2 -2,881.0 1.45x 24 576 96.0%
query80 12 10,738.1 8,660.9 -2,077.2 1.24x 48 1,952 97.6%
query24_part1 4 30,253.0 28,502.1 -1,750.8 1.06x 24 782 97.0%
query24_part2 4 29,731.9 28,084.7 -1,647.2 1.06x 24 782 97.0%
query72 8 11,829.0 10,182.8 -1,646.3 1.16x 64 1,210 95.0%
query99 4 39,909.8 38,264.6 -1,645.2 1.04x 32 584 94.8%
query96 3 7,305.3 5,739.9 -1,565.4 1.27x 24 582 96.0%
query75 13 10,424.8 8,973.0 -1,451.8 1.16x 25 1,862 98.7%
query65 5 8,748.9 7,411.5 -1,337.3 1.18x 32 832 96.3%
query26 4 5,019.9 3,754.4 -1,265.6 1.34x 32 604 95.0%
query66 8 19,178.1 18,083.0 -1,095.1 1.06x 32 1,240 97.5%
query67 2 9,907.0 8,844.3 -1,062.7 1.12x 16 292 94.8%
query4 6 6,674.1 5,614.1 -1,060.0 1.19x 16 929 98.3%
query97 2 4,420.2 3,373.8 -1,046.5 1.31x 8 305 97.4%
query46 3 3,643.6 2,641.4 -1,002.2 1.38x 24 507 95.5%
query47 11 23,827.8 22,923.7 -904.1 1.04x 26 522 95.3%
query7 4 7,800.3 6,902.7 -897.6 1.13x 32 584 94.8%
query59 6 7,257.6 6,368.1 -889.5 1.14x 20 374 94.9%
query2 5 4,822.3 3,975.7 -846.6 1.21x 21 552 96.3%
query76 6 9,514.3 8,679.7 -834.6 1.10x 16 942 98.3%
query34 4 2,122.8 1,375.8 -747.0 1.54x 32 483 93.8%
query74 4 3,017.3 2,300.4 -716.9 1.31x 16 611 97.4%
query11 4 3,117.1 2,425.1 -692.0 1.29x 16 611 97.4%
query57 9 12,307.0 11,656.8 -650.1 1.06x 24 507 95.5%
query27 4 7,163.6 6,518.8 -644.8 1.10x 32 588 94.8%
query36 3 6,048.5 5,439.8 -608.7 1.11x 24 438 94.8%
query29 5 2,878.9 2,318.7 -560.2 1.24x 40 960 96.0%
query70 5 12,060.4 11,507.8 -552.6 1.05x 25 596 96.0%
query90 6 2,430.7 1,920.6 -510.1 1.27x 32 754 95.9%
query48 3 2,796.2 2,321.5 -474.7 1.20x 18 295 94.2%
query35 4 2,005.5 1,538.4 -467.0 1.30x 10 503 98.1%
query50 3 2,875.2 2,449.3 -425.9 1.17x 24 576 96.0%
query86 2 3,145.2 2,723.3 -421.9 1.15x 16 304 95.0%
query53 3 2,752.6 2,357.3 -395.3 1.17x 24 438 94.8%
query71 5 3,197.2 2,838.0 -359.2 1.13x 24 1,131 97.9%
query10 5 1,132.5 804.2 -328.2 1.41x 24 431 94.7%
query87 3 1,164.0 840.3 -323.7 1.39x 8 465 98.3%
query89 3 14,312.8 13,990.8 -322.0 1.02x 24 441 94.8%
query63 3 2,481.4 2,161.2 -320.3 1.15x 24 438 94.8%
query17 5 1,313.9 1,030.0 -284.0 1.28x 32 468 93.6%
query38 3 1,311.3 1,046.1 -265.2 1.25x 8 467 98.3%
query18 4 1,394.7 1,134.1 -260.6 1.23x 26 233 90.0%
query85 6 1,220.0 964.5 -255.5 1.26x 27 295 91.6%
query31 11 1,207.9 965.5 -242.4 1.25x 29 826 96.6%
query68 6 1,072.4 834.0 -238.4 1.29x 39 392 91.0%
query51 2 1,214.1 987.9 -226.3 1.23x 8 306 97.5%
query25 5 1,293.3 1,072.9 -220.5 1.21x 32 468 93.6%
query1 5 787.5 576.0 -211.6 1.37x 18 288 94.1%
query79 3 2,726.6 2,552.0 -174.7 1.07x 24 396 94.3%
query5 6 1,679.6 1,507.3 -172.3 1.11x 30 1,351 97.8%
query69 4 581.9 423.3 -158.5 1.37x 9 386 97.6%
query81 3 738.9 586.3 -152.6 1.26x 14 295 95.3%
query73 4 651.5 500.3 -151.1 1.30x 32 417 92.9%
query13 4 941.1 803.7 -137.4 1.17x 14 146 91.2%
query43 2 7,424.3 7,308.6 -115.7 1.02x 16 294 94.8%
query8 4 1,062.8 949.1 -113.6 1.12x 18 258 93.5%
query3 2 1,226.2 1,118.4 -107.7 1.10x 16 284 94.7%
query30 4 1,367.8 1,269.1 -98.7 1.08x 18 349 95.1%
query40 3 1,109.8 1,012.4 -97.4 1.10x 24 276 92.0%
query93 1 280.2 189.2 -91.0 1.48x 8 192 96.0%
query33 9 484.7 397.4 -87.3 1.22x 17 744 97.8%
query61 7 964.7 878.0 -86.7 1.10x 26 630 95.9%
query19 3 707.5 625.4 -82.1 1.13x 18 256 93.4%
query60 9 737.3 657.6 -79.7 1.12x 17 764 97.8%
query52 2 474.2 403.1 -71.1 1.18x 16 242 93.8%
query42 2 478.9 409.1 -69.8 1.17x 16 248 93.9%
query32 4 318.6 250.2 -68.4 1.27x 17 255 93.8%
query49 3 334.7 276.6 -58.1 1.21x 8 266 97.1%
query77 12 664.3 614.9 -49.4 1.08x 26 1,085 97.7%
query84 5 686.8 640.2 -46.6 1.07x 39 539 93.3%
query83 18 214.5 170.6 -43.9 1.26x 18 368 95.3%
query92 4 211.4 168.4 -43.0 1.26x 17 255 93.8%
query55 2 349.3 316.7 -32.6 1.10x 16 242 93.8%
query22 1 158.4 129.1 -29.3 1.23x 8 114 93.4%
query15 1 93.7 64.5 -29.2 1.45x 4 28 87.1%
query58 12 186.2 157.3 -28.8 1.18x 17 644 97.4%
query64 33 960.8 939.1 -21.6 1.02x 125 482 79.4%
query39_part2 7 97.7 77.6 -20.1 1.26x 18 103 84.8%
query91 6 242.6 224.1 -18.5 1.08x 24 224 90.3%
query56 9 310.9 297.2 -13.7 1.05x 18 743 97.6%
query12 1 58.6 47.6 -11.0 1.23x 8 116 93.5%
query39_part1 7 241.6 233.5 -8.1 1.03x 26 95 78.1%
query98 1 60.7 52.7 -8.0 1.15x 8 120 93.8%
query45 3 444.4 437.9 -6.6 1.01x 16 67 80.4%
query54 6 211.8 205.3 -6.5 1.03x 42 495 92.1%
query21 3 109.9 103.8 -6.1 1.06x 21 114 84.4%
query20 1 64.0 58.9 -5.1 1.09x 8 116 93.5%
query37 2 28.6 24.9 -3.7 1.15x 12 67 84.4%
query41 1 2.8 2.5 -0.3 1.11x 1 7 86.1%
query82 2 32.3 35.5 3.1 0.91x 14 65 81.9%
query6 3 164.5 187.1 22.6 0.88x 11 47 80.9%
query95 3 12,102.3 12,560.6 458.3 0.96x 24 576 96.0%
query62 4 23,932.9 24,765.7 832.8 0.97x 32 492 93.9%
query23_part2 10 59,317.6 62,794.5 3,476.9 0.94x 32 836 96.3%

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Please provide the names of the existing tests in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Reuse cuDF generated hash map for the build side keys when doing join Cache broadcast hash tables

2 participants