Re-enable accelerated columnar-to-row path after fix in spark-rapids-jni#14651
Re-enable accelerated columnar-to-row path after fix in spark-rapids-jni#14651thirtiseven wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
The workaround from NVIDIA#12699 forced isAcceleratedTransposeSupported to false on every architecture to avoid the data corruption in NVIDIA#10062. That has now been fixed upstream in spark-rapids-jni by correcting the off-by-one in detail::determine_tiles (NVIDIA/spark-rapids-jni#4493), so restore the original post-Pascal gating. Closes NVIDIA#10062
Greptile SummaryRemoves the unconditional Confidence Score: 5/5Safe to merge once the upstream spark-rapids-jni#4493 fix is present in the consumed SNAPSHOT — no logic or resource management concerns in this change itself. Single-line logic restoration with a correctly re-added import and updated copyright; only finding is a stale P2 comment. No resource lifecycle, OOM retry, or data correctness concerns introduced. No files require special attention beyond verifying the JNI SNAPSHOT dependency. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[makeIteratorFunc called] --> B{CudfRowTransitions.areAllSupported\nAND 4 < columns <= 100M?}
B -- No --> C[ColumnarToRowIterator\n slow path]
B -- Yes --> D{isAcceleratedTransposeSupported\nCuda.getComputeCapabilityMajor > 6?}
D -- No\nPascal or older --> C
D -- Yes\nVolta+ --> E[AcceleratedColumnarToRowIterator\n fast GPU path re-enabled by this PR]
Reviews (3): Last reviewed commit: "signoff" | Re-trigger Greptile |
|
build |
Fixes #10062.
Description
PR #12699 worked around the data corruption reported in #10062 by forcing
isAcceleratedTransposeSupportedtofalseon every GPU architecture, disabling the fastAcceleratedColumnarToRowIteratorpath unconditionally. The root cause — an off-by-one indetail::determine_tilesthat silently dropped the trailing column of the table whenever that column alone tipped the tile's row-size estimate overshmem_limit_per_tile— has since been fixed in NVIDIA/spark-rapids-jni#4493. With that JNI fix in place, the workaround is no longer needed, so this PR restores the original post-Pascal gating (Cuda.getComputeCapabilityMajor > 6).Behaviorally this is a no-op for users — no new configuration, no plan changes. They will just see faster
collect/takeon wide fixed-width schemas.Not safe to merge until NVIDIA/spark-rapids-jni#4493 has landed and the JNI SNAPSHOT that spark-rapids depends on has picked it up. Hence the
[DO NOT MERGE]prefix.Checklists
Documentation
Testing
(
test_hash_multiple_grpby_pivotwithDATAGEN_SEED=1702610203— the original reproducer from [BUG] test_hash_multiple_grpby_pivot DATAGEN_SEED=1702610203 fails #10062 — exercises this path on a 192-column pivot schema; the matching regression testColumnToRowTests.PivotLikeLayoutlives in spark-rapids-jni alongside the kernel fix.)Performance