Run Fuzzer test for HashJoin with velox-cudf by karthikeyann · Pull Request #9 · rapidsai/velox

karthikeyann · 2025-05-01T20:14:17Z

Working on running fuzzer test

disabled LocalPartition (with round robin and hash)
disabled for few types (varbinary, timestamp, date, intervaldaytime)
Added debug prints (should be removed before merge)

The goal of this PR is to make sure HashJoin produces expected outputs.

change null equality to UNEQUAL handle empty input to build disable right joins (because design should be different)

…ger-overflow (facebookincubator#13831) Summary: Pull Request resolved: facebookincubator#13831 This avoids the following errors: ``` fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/std_abs.h:56:41: runtime error: negation of -9223372036854775808 cannot be represented in type 'long'; cast to an unsigned type to negate this value to itself #0 0x000000346ce5 in std::abs(long) fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/std_abs.h:56 rapidsai#1 0x000000345879 in std::shared_ptr<facebook::velox::BiasVector<facebook::velox::test::EvalTypeHelper<long>::Type>> facebook::velox::test::VectorMaker::biasVector<long>(std::vector<std::optional<long>, std::allocator<std::optional<long>>> const&) fbcode/velox/vector/tests/utils/VectorMaker-inl.h:58 rapidsai#2 0x000000344d34 in facebook::velox::test::BiasVectorErrorTest::errorTest(std::vector<std::optional<long>, std::allocator<std::optional<long>>>) fbcode/velox/vector/tests/BiasVectorTest.cpp:39 rapidsai#3 0x00000033ec99 in facebook::velox::test::BiasVectorErrorTest_checkRangeTooLargeError_Test::TestBody() fbcode/velox/vector/tests/BiasVectorTest.cpp:44 rapidsai#4 0x7fe0a2342c46 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) fbsource/src/gtest.cc:2727 rapidsai#5 0x7fe0a234275d in testing::Test::Run() fbsource/src/gtest.cc:2744 rapidsai#6 0x7fe0a2345fb3 in testing::TestInfo::Run() fbsource/src/gtest.cc:2890 rapidsai#7 0x7fe0a234c8eb in testing::TestSuite::Run() fbsource/src/gtest.cc:3068 rapidsai#8 0x7fe0a237b52b in testing::internal::UnitTestImpl::RunAllTests() fbsource/src/gtest.cc:6059 rapidsai#9 0x7fe0a237a0a2 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) fbsource/src/gtest.cc:2727 rapidsai#10 0x7fe0a23797f5 in testing::UnitTest::Run() fbsource/src/gtest.cc:5599 rapidsai#11 0x7fe0a2239800 in RUN_ALL_TESTS() fbsource/gtest/gtest.h:2334 rapidsai#12 0x7fe0a223952c in main fbcode/common/gtest/LightMain.cpp:20 rapidsai#13 0x7fe09ec2c656 in __libc_start_call_main /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:58:16 rapidsai#14 0x7fe09ec2c717 in __libc_start_main@GLIBC_2.2.5 /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409:3 rapidsai#15 0x00000033d8b0 in _start /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116 UndefinedBehaviorSanitizer: signed-integer-overflow fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/std_abs.h:56:41 ``` Avoid overflow by using the expression (static_cast<uint64_t>(1) + ~static_cast<uint64_t>(min)) to calculate the absolute value of min without using std::abs Reviewed By: dmm-fb, peterenescu Differential Revision: D76901449 fbshipit-source-id: 7eb3bd0f83e42f44cdf34ea1759f3aa9e1042dae

Try restoring ccache

Summary: Fixes OSS Asan segV due to calling 'as->' on a nullptr. ``` ================================================================= ==4058438==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000a563a4 bp 0x7ffd54ee5bc0 sp 0x7ffd54ee5aa0 T0) ==4058438==The signal is caused by a READ memory access. ==4058438==Hint: address points to the zero page. #0 0x000000a563a4 in facebook::velox::FlatVector<int>* facebook::velox::BaseVector::as<facebook::velox::FlatVector<int>>() /velox/./velox/vector/BaseVector.h:116:12 rapidsai#1 0x000000a563a4 in facebook::velox::test::(anonymous namespace)::FlatMapVectorTest_encodedKeys_Test::TestBody() /velox/velox/vector/tests/FlatMapVectorTest.cpp:156:5 rapidsai#2 0x70874f90ce0b (/lib64/libgtest.so.1.11.0+0x4fe0b) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679) rapidsai#3 0x70874f8ed825 in testing::Test::Run() (/lib64/libgtest.so.1.11.0+0x30825) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679) rapidsai#4 0x70874f8ed9ef in testing::TestInfo::Run() (/lib64/libgtest.so.1.11.0+0x309ef) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679) rapidsai#5 0x70874f8edaf8 in testing::TestSuite::Run() (/lib64/libgtest.so.1.11.0+0x30af8) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679) rapidsai#6 0x70874f8fcfc4 in testing::internal::UnitTestImpl::RunAllTests() (/lib64/libgtest.so.1.11.0+0x3ffc4) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679) rapidsai#7 0x70874f8fa7c7 in testing::UnitTest::Run() (/lib64/libgtest.so.1.11.0+0x3d7c7) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679) rapidsai#8 0x70877c073153 in main (/lib64/libgtest_main.so.1.11.0+0x1153) (BuildId: c3a576d37d6cfc6875afdc98684c143107a226a0) rapidsai#9 0x70874f48460f in __libc_start_call_main (/lib64/libc.so.6+0x2a60f) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e) rapidsai#10 0x70874f4846bf in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a6bf) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e) rapidsai#11 0x00000044c1b4 in _start (/velox/_build/debug/velox/vector/tests/velox_vector_test+0x44c1b4) (BuildId: 6da0b0d1074134be8f4d4534e5dbac9eeb9d482b) ``` Reviewed By: peterenescu Differential Revision: D91275269 fbshipit-source-id: 0806aa7562dc8cf4ad708fc6a8e4b29409507745

Summary: Pull Request resolved: facebookincubator#16102 Fixes Asan error in S3Util.cpp, See stack trace below: ``` ==4125762==ERROR: AddressSanitizer: global-buffer-overflow on address 0x0000006114ff at pc 0x70aa17bc0120 bp 0x7ffe905f3030 sp 0x7ffe905f3028 READ of size 1 at 0x0000006114ff thread T0 #0 0x70aa17bc011f in facebook::velox::filesystems::parseAWSStandardRegionName[abi:cxx11](std::basic_string_view<char, std::char_traits<char>>) /velox/velox/connectors/hive/storage_adapters/s3fs/S3Util.cpp:160:16 rapidsai#1 0x00000055790b in facebook::velox::filesystems::S3UtilTest_parseAWSRegion_Test::TestBody() /velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:147:3 rapidsai#2 0x70aa2e89be0b (/lib64/libgtest.so.1.11.0+0x4fe0b) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679) rapidsai#3 0x70aa2e87c825 in testing::Test::Run() (/lib64/libgtest.so.1.11.0+0x30825) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679) rapidsai#4 0x70aa2e87c9ef in testing::TestInfo::Run() (/lib64/libgtest.so.1.11.0+0x309ef) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679) rapidsai#5 0x70aa2e87caf8 in testing::TestSuite::Run() (/lib64/libgtest.so.1.11.0+0x30af8) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679) rapidsai#6 0x70aa2e88bfc4 in testing::internal::UnitTestImpl::RunAllTests() (/lib64/libgtest.so.1.11.0+0x3ffc4) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679) rapidsai#7 0x70aa2e8897c7 in testing::UnitTest::Run() (/lib64/libgtest.so.1.11.0+0x3d7c7) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679) rapidsai#8 0x70aa2e8ba153 in main (/lib64/libgtest_main.so.1.11.0+0x1153) (BuildId: c3a576d37d6cfc6875afdc98684c143107a226a0) rapidsai#9 0x70aa01ceb60f in __libc_start_call_main (/lib64/libc.so.6+0x2a60f) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e) rapidsai#10 0x70aa01ceb6bf in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a6bf) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e) rapidsai#11 0x000000408684 in _start (/velox/_build/debug/velox/connectors/hive/storage_adapters/s3fs/tests/velox_s3file_test+0x408684) (BuildId: bbf3099c9a66a548c6da234b17ad1b631e9ed649) 0x0000006114ff is located 33 bytes before global variable '.str.135' defined in '/velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:126' (0x000000611520) of size 46 '.str.135' is ascii string 'isHostExcludedFromProxy(hostname, pair.first)' 0x0000006114ff is located 1 bytes before global variable '.str.133' defined in '/velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:122' (0x000000611500) of size 1 '.str.133' is ascii string '' 0x0000006114ff is located 42 bytes after global variable '.str.132' defined in '/velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:121' (0x0000006114c0) of size 21 '.str.132' is ascii string 'localhost,foobar.com' AddressSanitizer: global-buffer-overflow /velox/velox/connectors/hive/storage_adapters/s3fs/S3Util.cpp:160:16 in facebook::velox::filesystems::parseAWSStandardRegionName[abi:cxx11](std::basic_string_view<char, std::char_traits<char>>) Shadow bytes around the buggy address: ``` Reviewed By: pedroerp Differential Revision: D91278230 fbshipit-source-id: 05283bc8408069fa3f5ab8a7840b2bd0835fa7d6

@pentschev

The 4KB metadata buffer limit (kMetaBufSize) was too small for tables with many columns, where cudf metadata alone can exceed 4KB. Wide tables would silently truncate metadata, causing data corruption. Rename kMetaBufSize to kMaxMetaBufSize, increase to 1MB (enough for ~10,000+ columns). Add kMetaHeaderSize constant. Change serialize() to allocate exact size needed instead of fixed buffer, reducing memory waste for small metadata. Add const to getSerializedSize(). Update CudfExchangeSource to use kMaxMetaBufSize for the receive buffer. Review: @pentschev comment #9

@pentschev

The 4KB metadata buffer limit (kMetaBufSize) was too small for tables with many columns, where cudf metadata alone can exceed 4KB. Wide tables would silently truncate metadata, causing data corruption. Rename kMetaBufSize to kMaxMetaBufSize, increase to 1MB (enough for ~10,000+ columns). Add kMetaHeaderSize constant. Change serialize() to allocate exact size needed instead of fixed buffer, reducing memory waste for small metadata. Add const to getSerializedSize(). Update CudfExchangeSource to use kMaxMetaBufSize for the receive buffer. Review: @pentschev comment #9

@pentschev

The 4KB metadata buffer limit (kMetaBufSize) was too small for tables with many columns, where cudf metadata alone can exceed 4KB. Wide tables would silently truncate metadata, causing data corruption. Rename kMetaBufSize to kMaxMetaBufSize, increase to 1MB (enough for ~10,000+ columns). Add kMetaHeaderSize constant. Change serialize() to allocate exact size needed instead of fixed buffer, reducing memory waste for small metadata. Add const to getSerializedSize(). Update CudfExchangeSource to use kMaxMetaBufSize for the receive buffer. Review: @pentschev comment #9

* feat(cudf): improve Arrow interop for decimals and varbinary Cherry-picked from 00889a2 (PR facebookincubator#16612 Part 1). Adds veloxToCudfDataType() preserving DECIMAL scale, extends Arrow bridge options for varbinary and decimal type width. Merged with ferd-dev's pinnedToArrowHost optimization. Made-with: Cursor * feat(cudf): add decimal expression support and tests Cherry-picked from 67e400d (PR facebookincubator#16750 Part 2). Adds CUDA kernels for decimal binary/unary/comparison ops, decimal type utilities, and extends expression evaluator with decimal-aware binary operators. Merged with ferd-dev's date_add and row_constructor functions. Made-with: Cursor * style: format decimal code (cherry-pick f5bc585) Made-with: Cursor * fix: fix after rebase for decimal tests (cherry-pick 564ae5c) Made-with: Cursor * feat(cudf): enable GPU ceil() (cherry-pick 2ac50f2) Made-with: Cursor * feat(cudf): add decimal sum/avg kernels and aggregation support Cherry-picked from f834458 (PR facebookincubator#16751 Part 3). Adds CUDA kernels for decimal SUM/AVG aggregation, decimal-aware type handling in hash aggregation with veloxToCudfDataType. Merged with ferd-dev's GpuGuard and pinned memory optimizations. Made-with: Cursor * feat(cudf): add TopNRowNumber GPU operator, xxhash64, and code quality fixes - Add CudfTopNRowNumber operator with GPU-based sort, groupby scan ranking, and TopNRowNumber kernel implementation - Add xxhash64_with_seed function for GPU hash computation - Unify memory resource usage to cudf::get_current_device_resource_ref() - Improve expression evaluator with priority-sorted candidate fallback and diagnostic logging when all evaluators fail - Fix indentation in CudfHashJoinProbe::initialize() - Remove redundant member variable writes in CudfTopNRowNumber constructor - Correct decimal type handling (veloxToCudfDataType) in NestedLoopJoin - Enhance AstExpressionUtils with VARCHAR-to-DATE constant folding and disambiguated field name resolution for join filters Made-with: Cursor * fix(cudf): fix xxhash64 tests, refactor InFunction, add varbinary-as-string export - Fix xxhash64_with_seed test expected values and add single-string test case (verified passing on GPU with cudf::hashing::xxhash_64) - Refactor IN expression from inline eval to proper InFunction CudfFunction class with type-cast support and null handling - Add exportVarbinaryAsString ArrowOption for cudf interop (cudf doesn't support Arrow binary type, export as utf-8 string instead) Made-with: Cursor * fix(cudf): handle field name mismatch in n{nodeId}_{colIdx} resolution (query31) When Gluten's Substrait plan uses different node IDs for the same column (e.g. expression references n11_1 but runtime schema has n10_1), the cuDF expression evaluator failed with "Field not found" and crashed the task. Two-part fix: 1. OperatorAdapters: add allFieldsResolvable() check in FilterProjectAdapter::canRunOnGPU() to detect unresolvable field references before creating CudfFilterProject. Falls back to CPU gracefully instead of crashing. 2. AstExpressionUtils: add colIdx-based fallback in pushExprToTree() and addPrecomputeInstruction() that matches n{X}_{Y} fields by their _{Y} suffix when exact name lookup fails. Only used when exactly one schema column shares the same colIdx suffix. Includes unit tests for both the successful fallback case and the no-match case. Made-with: Cursor * fix(cudf): add zero-size guards to prevent cudaErrorInvalidValue (query4/6/11/15/19/28/30/74/80) Add defensive checks for zero-row inputs that trigger invalid CUDA memory allocations through RMM's device_buffer: CudfHashJoinProbe: - Early return in getOutput() when probe table has 0 rows - Guard rightMatchedFlags_ init for empty build tables (n=0) - Guard leftSemiProjectJoin for leftNumRows=0 - Guard cudf::sequence calls in rightJoin/fullJoin for n=0 - Skip precomputeSubexpressions for empty right tables CudfFilterProject: - Skip project() when filter removes all rows (0-row result) Made-with: Cursor * fix(cudf): prevent Jitify error for STRING ops in AST evaluator (query83) cudf::compute_column / Jitify cannot handle variable-width types (STRING) in AST operations. Added two layers of protection: 1. isOpAndInputsSupported rejects non-fixed-width inputs, preventing the AST/JIT evaluator from claiming support for STRING operations at the top level. 2. pushExprToTree routes binary ops, IN, and BETWEEN with variable-width inputs to precompute instructions via FunctionExpression, handling the nested case (e.g., STRING ops inside AND expressions). Made-with: Cursor * fix(cudf): implement DATE→VARCHAR cast and might_contain bloom filter bypass - CastFunction: handle DATE to VARCHAR using cudf::strings::from_timestamps with "%Y-%m-%d" format instead of unsupported cudf::cast for this type pair - MightContainFunction: return all-true boolean column as safe placeholder for bloom filter probabilistic check (false positives are acceptable by design) - Update FunctionExpression::canEvaluate and create to route both expressions Fixes query83 (DATE cast), query11/15/74/4/80 (might_contain). Made-with: Cursor * fix(cudf): implement isnull/isnotnull expression for GPU evaluation (query44/76) IsNullFunction checks input column null mask using cudf::replace_nulls and cudf::unary_operation(NOT). Returns all-false/true constant when column has no nulls. Handles both isnull (negate=false) and isnotnull (negate=true) variants. Made-with: Cursor * fix(cudf): add field resolution check to CudfHashJoinBaseAdapter (query31 revisit) The previous fix (fe2bf9659) only added allFieldsResolvable check to FilterProjectAdapter. The actual crash was in CudfHashJoinProbe where precomputeSubexpressions evaluated a FunctionExpression with a per-side schema missing the n11_1 field. Add allFieldsResolvable validation to CudfHashJoinBaseAdapter::canRunOnGPU which checks the join filter's field references against the combined left+right schema. If any field is unresolvable, fall back to CPU. Made-with: Cursor * fix(cudf): add equalto/notequalto aliases for comparison operators Spark uses 'equalto' and 'notequalto' as expression names but cudf only registered 'equal'/'eq' and 'notequal'/'neq'. Add the missing aliases so these expressions are routed to GPU binary operators. Fixes query1, query71, query72 and other queries using equalto in filter pushdown expressions. Made-with: Cursor * fix(cudf): add kIntermediate step for mean aggregation in doReduce (query28) MeanAggregator::doReduce was missing the kIntermediate case, causing "Unsupported aggregation step for mean" when the query plan has a partial → intermediate → final aggregation pipeline. The intermediate step receives a struct(sum, count) from the previous stage, re-reduces (sum-of-sums, sum-of-counts), and outputs a new struct for the next stage. Made-with: Cursor * fix(cudf): skip STRING subfield filters from AST/Jitify in TableScan (query33/43/56/60/61) STRING/Bytes filter types (kBytesValues, kNegatedBytesValues, kBytesRange) cannot be evaluated by cudf AST or Jitify, causing "Jitify fatal error: Deserialization failed" / "Uninitialized" crashes in TableScan. Mark these filter kinds as VELOX_NYI in createAstFromSubfieldFilter so they are gracefully skipped. createAstFromSubfieldFilters now catches per-filter exceptions and continues with remaining (non-STRING) filters. If all filters are unsupported, CudfHiveDataSource catches the exception and leaves subfieldFilterExpr_ null, relying on the downstream FilterProject to apply the predicates on CPU. Made-with: Cursor * fix(cudf): prevent SIGSEGV in hash join probe for skewed partitions (query68) Convert GPU crashes (SIGSEGV/exit code 139) into clean VeloxRuntimeError exceptions that enable CPU fallback: - Wrap inner_join() call in try-catch to convert CUDA/RMM allocation failures into diagnostic Velox errors with probe/build row counts - Add VELOX_CHECK_LE on join output size vs cudf::size_type max to detect oversized join results before gather operations - Wrap gather/filter phase in try-catch for the same reason - Guard rightTables[0] access with empty-check in isBlocked() - Guard rightTables[0] access with empty-check in rightSemiFilterJoin() Root cause: partition 2 of stage 25 has extreme data skew, producing a massive join output that exhausts GPU memory or overflows size_type. In v11 this failed cleanly (cudaErrorInvalidValue); in v12 the zero-size guards let execution proceed deeper, converting the failure into an unrecoverable SIGSEGV. Made-with: Cursor (Agent D) * fix(cudf): lazy table_view construction to prevent column size mismatch (query18) ASTExpression::eval and JitExpression::eval eagerly constructed a cudf::table_view from input columns + precomputed columns, even when the root AST node was a column_reference that never used the table_view. Precomputed columns from nested evaluators may have different row counts (e.g., after filtering), triggering a spurious Column size mismatch error. Fix: defer table_view construction into the else branch where compute_column actually needs it. Also add output column size validation in CudfFilterProject::getOutput as a safety net. Made-with: Cursor * fix(cudf): graceful AST filter fallback + scalar precompute expansion CudfHashJoinProbe: wrap AST tree creation in try-catch so unsupported filter expressions (e.g. unresolvable fields) disable the AST filter instead of crashing the operator. AstExpressionUtils: expand 0/1-row precomputed scalar sub-expressions to match input row count, preventing "Column size mismatch" when constant expressions produce fewer rows than the input table. Made-with: Cursor * fix(cudf): validate all field references in precompute instructions (query4/11) When pushExprToTree creates a precompute instruction for a sub-expression, findExpressionSide may select a side whose schema does not contain all the expression's field references (e.g. n8_1 missing from probe side n7_*). This creates a FunctionExpression with an incomplete schema that crashes at eval time in precomputeSubexpressions (Field not found). Fix: In both precomputeVarWidthOp and the general precompute path, validate that ALL fields in the expression exist in the chosen side's schema. If not, return nullptr (varwidth) or throw VELOX_FAIL (general), causing createAstTree to fail at init time where the Leader's try-catch sets useAstFilter_=false and falls back to the combined-schema filterEvaluator_. Made-with: Cursor * fix(cudf): OOM-resilient probe splitting for hash join (query72/73/76) When a hash join OOM occurs (e.g. query72 inner_join needing 6.5GB for 723K×1.9M row cardinality explosion), the probe table is split in half and each half is retried independently. This iterates until the join fits in GPU memory or hits kMinSplitRows (1024). Changes: - innerJoin: re-throw std::bad_alloc so it propagates to the retry loop instead of being converted to VeloxRuntimeError - getOutput(): wrap the join dispatch in an iterative probe-splitting retry loop; on std::bad_alloc, split via cudf::split and re-enqueue - Only split for join types where it is semantics-preserving: Inner, Left, LeftSemiFilter, LeftSemiProject, Anti - Also applies Leader's precompute try-catch hardening to leftJoin (mirrors the existing innerJoin pattern) Made-with: Cursor * fix(cudf): add runtime Jitify error catch in CudfHiveDataSource compute_column Defense-in-depth for Issue 3 (Jitify Deserialization in TableScan). When cudf::compute_column throws a Jitify error (or any exception) during subfield filter evaluation in the experimental reader path, catch it and return the unfiltered data instead of crashing the task. The downstream FilterProject will handle the filtering on CPU. This complements b7ed2a6dc which prevents STRING filters from reaching Jitify at init time. Made-with: Cursor * fix(cudf): prevent cudf::cast on STRING columns in InFunction (query83) InFunction::shouldSkipCast incorrectly skipped DATE→STRING casts because cudf::is_supported_cast returns false for that pair, even though CastFunction handles it via cudf::strings::from_timestamps. This caused buildHaystackColumn to attempt cudf::cast(STRING, TIMESTAMP_DAYS) which fails at cast_ops.cu:411 ("Column type must be numeric or chrono or decimal32/64/128"). Fix: 1) Return false from shouldSkipCast for DATE→STRING since CastFunction supports it. 2) Add safety net in buildHaystackColumn: when haystack is STRING and target is a timestamp type, use cudf::strings::to_timestamps instead of cudf::cast. Made-with: Cursor * fix(cudf): add SIGSEGV signal handler and CUDA error synchronization at operator boundaries Agent D's try-catch fix (b7fcd02d4) didn't work because SIGSEGV is a signal, not a C++ exception. v14 still crashes on query18/29/45/68. Root cause analysis of v14 SIGSEGV crashes: - query68: single partition crash (~86s), data skew pattern unchanged - query29: v12 PASS(8s) → v14 FAIL, ALL tasks crash simultaneously - query45: v12 PASS(6s) → v14 FAIL, ALL executors crash within 3s - query18: ALL 28 partitions of stage 33 crash on ALL executors - All crashes show ~86s delay, consistent with asynchronous CUDA error propagation cascading into SIGSEGV Two-pronged fix: 1. Fatal signal handler (Utilities.cpp, ToCudf.cpp): Install SIGSEGV/SIGABRT/SIGBUS handler in registerCudf() that captures native backtrace via backtrace_symbols_fd() before re-raising to the JVM's handler. This gives us the exact crash location in executor stderr for ALL future SIGSEGV queries. 2. CUDA stream sync + error checking (operator boundaries): Add checkCudaOperationError() that synchronizes the CUDA stream and checks cudaPeekAtLastError() before each operator's main GPU work. This converts asynchronous CUDA errors (illegal address, OOM) into clean VeloxRuntimeError before they cascade into SIGSEGV. Added to: CudfFilterProject, CudfHashJoinBuild, CudfHashJoinProbe, CudfHashAggregation, CudfToVelox. Also fixes: pessimizing-move warning in CudfHiveDataSource.cpp. Made-with: Cursor * Revert "fix(cudf): add SIGSEGV signal handler and CUDA error synchronization at operator boundaries" This reverts commit d92407f679707b38746b068daa3445bfd29ac26c. * fix(cudf): skip subfield filter AST when any column is STRING/VARBINARY The normal parquet reader path (splitReader_->read_chunk()) applies the subfield filter AST internally via cudf::compute_column, which uses Jitify for string comparisons. When Jitify fails, the error is thrown inside the reader and cannot be caught externally. Fix: before building the subfield filter AST, check column types of all subfield filters. If any column is VARCHAR or VARBINARY, skip the entire GPU subfield filtering (set subfieldFilterExpr_ = nullptr). The downstream CPU FilterProject handles these predicates instead. This prevents Jitify errors in query8/33/43/56/60/61/91 by ensuring string-typed filters never reach the parquet reader's AST evaluator. Made-with: Cursor * fix(cudf): clear CUDA error state after OOM in hash join probe (Q72) After a CUDA OOM in CudfHashJoinProbe, the cuda_async_view_memory_resource pool enters an error state where ALL subsequent allocations fail with cudaErrorIllegalAddress — even 1-byte allocations. This cascade kills every concurrent task on the same GPU. Fix: call stream.synchronize_no_throw() + cudaGetLastError() in the std::bad_alloc catch block to clear the sticky CUDA error before the split-and-retry logic. Without this, probe splitting was never effective because retried allocations immediately failed on the corrupted pool. Also includes Agent B's pending improvements: pendingJoinOutputs_ streaming (avoids concatenation peak memory), empty table filtering. Made-with: Cursor * Revert "fix(cudf): clear CUDA error state after OOM in hash join probe (Q72)" This reverts commit a95d6dd. * fix(cudf): batched join output + CUDA error recovery for OOM (query72) Two root causes identified for Q72 CUDA OOM surviving probe splitting: 1. After std::bad_alloc from RMM, the CUDA async pool leaves a sticky error on the device. The subsequent cudf::split (which launches a null-count kernel) fails with cudaErrorIllegalAddress, escaping the catch handler entirely. Fix: call stream.synchronize_no_throw() and cudaGetLastError() to clear the sticky error before splitting. 2. Even when splitting succeeds, concatenateTables at the end allocates a contiguous buffer for ALL split results (~667MB for 112M rows), doubling peak GPU memory. Fix: adopt Spark RAPIDS JoinGatherer pattern — store multiple results in pendingJoinOutputs_ and return one per getOutput() call, avoiding the concatenation allocation. Also fix isFinished() to not report done while pendingJoinOutputs_ still has tables to emit. Made-with: Cursor * fix(cudf): reject AST expressions with non-matching operand types cuDF's AST expression parser requires all operands of a binary operation to have identical cudf::data_type (including decimal scale). When operand types differ (e.g., DECIMAL(15,2) vs DECIMAL(7,4)), cudf::compute_column throws "non-matching operand types" at runtime, crashing queries Q61/Q83. Three-layer fix: 1. isOpAndInputsSupported: reject binary ops with mismatched types early, preventing ASTExpression from being selected (falls to FunctionExpression) 2. pushExprToTree: for binary/between ops with type mismatch, try precompute via FunctionExpression; if that fails, throw to trigger full fallback 3. ASTExpression::eval: defensive catch around cudf::compute_column converts cuDF exceptions to VeloxException for clearer diagnostics Made-with: Cursor * fix(cudf): remove duplicate divide/greaterthan registration that drops decimal signatures (Q18) The `divide` and `greaterthan` functions were registered twice: first with only (double,double) signatures, then via registerBinaryOp/ registerComparisonOp with both double AND decimal signatures. Since overwrite=false, the second registration was silently dropped, meaning decimal divide and decimal greater-than never matched in canEvaluate. Also wrap canBeEvaluatedByCudf expression compilation in try-catch so that VeloxException from decimal type resolution (e.g. "Variable a_precision is not defined") gracefully falls back to CPU instead of potentially crashing the task. Made-with: Cursor * fix(cudf): disable JIT by default, fix JIT table view and SubfieldFilter isNull param - Disable jitExpressionEnabled by default (less mature than AST/Function) - Fix JitExpression to construct astInputTableView from inputColumnViews + precomputedColumns before passing to compute_column_jit - Add missing isNull parameter to makeScalarAndLiteral calls in SubfieldFiltersToAst.cpp Made-with: Cursor

karthikeyann added DO NOT MERGE Hold off on merging; see PR for details bug Something isn't working labels May 1, 2025

karthikeyann added 4 commits May 7, 2025 16:53

run JoinFuzzer with velox-cudf

591e032

add debug prints in hashjoin

c7c69b4

print stats in fuzzer

1dfb808

fix null behavior issue in hashJoin

38395af

change null equality to UNEQUAL handle empty input to build disable right joins (because design should be different)

karthikeyann force-pushed the fuzz-join branch from 66e1e36 to 38395af Compare May 7, 2025 22:15

karthikeyann added 2 commits May 7, 2025 17:18

rightJoin debug, fix try code

ab49715

debug print if not supported join in driverAdapter

3027ed3

copy-pr-bot bot pushed a commit that referenced this pull request Sep 10, 2025

Merge pull request #9 from bdice/restore-ccache

1dfa273

Try restoring ccache

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run Fuzzer test for HashJoin with velox-cudf#9

Run Fuzzer test for HashJoin with velox-cudf#9
karthikeyann wants to merge 6 commits intorapidsai:velox-cudffrom
karthikeyann:fuzz-join

karthikeyann commented May 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

karthikeyann commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

karthikeyann commented May 1, 2025 •

edited

Loading