Skip to content

Run Fuzzer test for HashJoin with velox-cudf#9

Draft
karthikeyann wants to merge 6 commits intorapidsai:velox-cudffrom
karthikeyann:fuzz-join
Draft

Run Fuzzer test for HashJoin with velox-cudf#9
karthikeyann wants to merge 6 commits intorapidsai:velox-cudffrom
karthikeyann:fuzz-join

Conversation

@karthikeyann
Copy link
Copy Markdown

@karthikeyann karthikeyann commented May 1, 2025

Working on running fuzzer test

  • disabled LocalPartition (with round robin and hash)
  • disabled for few types (varbinary, timestamp, date, intervaldaytime)
  • Added debug prints (should be removed before merge)

The goal of this PR is to make sure HashJoin produces expected outputs.

@karthikeyann karthikeyann added DO NOT MERGE Hold off on merging; see PR for details bug Something isn't working labels May 1, 2025
change null equality to UNEQUAL
handle empty input to build
disable right joins (because design should be different)
VinithKrishnan pushed a commit to VinithKrishnan/velox-rapidsai that referenced this pull request Jun 29, 2025
…ger-overflow (facebookincubator#13831)

Summary:
Pull Request resolved: facebookincubator#13831

This avoids the following errors:

```
fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/std_abs.h:56:41: runtime error: negation of -9223372036854775808 cannot be represented in type 'long'; cast to an unsigned type to negate this value to itself
    #0 0x000000346ce5 in std::abs(long) fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/std_abs.h:56
    rapidsai#1 0x000000345879 in std::shared_ptr<facebook::velox::BiasVector<facebook::velox::test::EvalTypeHelper<long>::Type>> facebook::velox::test::VectorMaker::biasVector<long>(std::vector<std::optional<long>, std::allocator<std::optional<long>>> const&) fbcode/velox/vector/tests/utils/VectorMaker-inl.h:58
    rapidsai#2 0x000000344d34 in facebook::velox::test::BiasVectorErrorTest::errorTest(std::vector<std::optional<long>, std::allocator<std::optional<long>>>) fbcode/velox/vector/tests/BiasVectorTest.cpp:39
    rapidsai#3 0x00000033ec99 in facebook::velox::test::BiasVectorErrorTest_checkRangeTooLargeError_Test::TestBody() fbcode/velox/vector/tests/BiasVectorTest.cpp:44
    rapidsai#4 0x7fe0a2342c46 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) fbsource/src/gtest.cc:2727
    rapidsai#5 0x7fe0a234275d in testing::Test::Run() fbsource/src/gtest.cc:2744
    rapidsai#6 0x7fe0a2345fb3 in testing::TestInfo::Run() fbsource/src/gtest.cc:2890
    rapidsai#7 0x7fe0a234c8eb in testing::TestSuite::Run() fbsource/src/gtest.cc:3068
    rapidsai#8 0x7fe0a237b52b in testing::internal::UnitTestImpl::RunAllTests() fbsource/src/gtest.cc:6059
    rapidsai#9 0x7fe0a237a0a2 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) fbsource/src/gtest.cc:2727
    rapidsai#10 0x7fe0a23797f5 in testing::UnitTest::Run() fbsource/src/gtest.cc:5599
    rapidsai#11 0x7fe0a2239800 in RUN_ALL_TESTS() fbsource/gtest/gtest.h:2334
    rapidsai#12 0x7fe0a223952c in main fbcode/common/gtest/LightMain.cpp:20
    rapidsai#13 0x7fe09ec2c656 in __libc_start_call_main /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    rapidsai#14 0x7fe09ec2c717 in __libc_start_main@GLIBC_2.2.5 /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409:3
    rapidsai#15 0x00000033d8b0 in _start /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116

UndefinedBehaviorSanitizer: signed-integer-overflow fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/std_abs.h:56:41
```
Avoid overflow by using the expression (static_cast<uint64_t>(1) + ~static_cast<uint64_t>(min)) to calculate the absolute value of min without using std::abs

Reviewed By: dmm-fb, peterenescu

Differential Revision: D76901449

fbshipit-source-id: 7eb3bd0f83e42f44cdf34ea1759f3aa9e1042dae
copy-pr-bot bot pushed a commit that referenced this pull request Sep 10, 2025
karthikeyann pushed a commit to mhaseeb123/velox that referenced this pull request Jan 26, 2026
Summary:
Fixes OSS Asan segV due to calling 'as->' on a nullptr.

```
=================================================================
==4058438==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000a563a4 bp 0x7ffd54ee5bc0 sp 0x7ffd54ee5aa0 T0)
==4058438==The signal is caused by a READ memory access.
==4058438==Hint: address points to the zero page.
    #0 0x000000a563a4 in facebook::velox::FlatVector<int>* facebook::velox::BaseVector::as<facebook::velox::FlatVector<int>>() /velox/./velox/vector/BaseVector.h:116:12
    rapidsai#1 0x000000a563a4 in facebook::velox::test::(anonymous namespace)::FlatMapVectorTest_encodedKeys_Test::TestBody() /velox/velox/vector/tests/FlatMapVectorTest.cpp:156:5
    rapidsai#2 0x70874f90ce0b  (/lib64/libgtest.so.1.11.0+0x4fe0b) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    rapidsai#3 0x70874f8ed825 in testing::Test::Run() (/lib64/libgtest.so.1.11.0+0x30825) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    rapidsai#4 0x70874f8ed9ef in testing::TestInfo::Run() (/lib64/libgtest.so.1.11.0+0x309ef) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    rapidsai#5 0x70874f8edaf8 in testing::TestSuite::Run() (/lib64/libgtest.so.1.11.0+0x30af8) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    rapidsai#6 0x70874f8fcfc4 in testing::internal::UnitTestImpl::RunAllTests() (/lib64/libgtest.so.1.11.0+0x3ffc4) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    rapidsai#7 0x70874f8fa7c7 in testing::UnitTest::Run() (/lib64/libgtest.so.1.11.0+0x3d7c7) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    rapidsai#8 0x70877c073153 in main (/lib64/libgtest_main.so.1.11.0+0x1153) (BuildId: c3a576d37d6cfc6875afdc98684c143107a226a0)
    rapidsai#9 0x70874f48460f in __libc_start_call_main (/lib64/libc.so.6+0x2a60f) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e)
    rapidsai#10 0x70874f4846bf in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a6bf) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e)
    rapidsai#11 0x00000044c1b4 in _start (/velox/_build/debug/velox/vector/tests/velox_vector_test+0x44c1b4) (BuildId: 6da0b0d1074134be8f4d4534e5dbac9eeb9d482b)
```

Reviewed By: peterenescu

Differential Revision: D91275269

fbshipit-source-id: 0806aa7562dc8cf4ad708fc6a8e4b29409507745
karthikeyann pushed a commit to mhaseeb123/velox that referenced this pull request Jan 26, 2026
Summary:
Pull Request resolved: facebookincubator#16102

Fixes Asan error in S3Util.cpp, See stack trace below:

```
==4125762==ERROR: AddressSanitizer: global-buffer-overflow on address 0x0000006114ff at pc 0x70aa17bc0120 bp 0x7ffe905f3030 sp 0x7ffe905f3028
READ of size 1 at 0x0000006114ff thread T0
    #0 0x70aa17bc011f in facebook::velox::filesystems::parseAWSStandardRegionName[abi:cxx11](std::basic_string_view<char, std::char_traits<char>>) /velox/velox/connectors/hive/storage_adapters/s3fs/S3Util.cpp:160:16
    rapidsai#1 0x00000055790b in facebook::velox::filesystems::S3UtilTest_parseAWSRegion_Test::TestBody() /velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:147:3
    rapidsai#2 0x70aa2e89be0b  (/lib64/libgtest.so.1.11.0+0x4fe0b) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    rapidsai#3 0x70aa2e87c825 in testing::Test::Run() (/lib64/libgtest.so.1.11.0+0x30825) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    rapidsai#4 0x70aa2e87c9ef in testing::TestInfo::Run() (/lib64/libgtest.so.1.11.0+0x309ef) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    rapidsai#5 0x70aa2e87caf8 in testing::TestSuite::Run() (/lib64/libgtest.so.1.11.0+0x30af8) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    rapidsai#6 0x70aa2e88bfc4 in testing::internal::UnitTestImpl::RunAllTests() (/lib64/libgtest.so.1.11.0+0x3ffc4) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    rapidsai#7 0x70aa2e8897c7 in testing::UnitTest::Run() (/lib64/libgtest.so.1.11.0+0x3d7c7) (BuildId: 506b2df0fc901091ff83631fd797a325cae6b679)
    rapidsai#8 0x70aa2e8ba153 in main (/lib64/libgtest_main.so.1.11.0+0x1153) (BuildId: c3a576d37d6cfc6875afdc98684c143107a226a0)
    rapidsai#9 0x70aa01ceb60f in __libc_start_call_main (/lib64/libc.so.6+0x2a60f) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e)
    rapidsai#10 0x70aa01ceb6bf in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2a6bf) (BuildId: 4dbf824d0f6afd9b2faee4787d89a39921c0a65e)
    rapidsai#11 0x000000408684 in _start (/velox/_build/debug/velox/connectors/hive/storage_adapters/s3fs/tests/velox_s3file_test+0x408684) (BuildId: bbf3099c9a66a548c6da234b17ad1b631e9ed649)

0x0000006114ff is located 33 bytes before global variable '.str.135' defined in '/velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:126' (0x000000611520) of size 46
  '.str.135' is ascii string 'isHostExcludedFromProxy(hostname, pair.first)'
0x0000006114ff is located 1 bytes before global variable '.str.133' defined in '/velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:122' (0x000000611500) of size 1
  '.str.133' is ascii string ''
0x0000006114ff is located 42 bytes after global variable '.str.132' defined in '/velox/velox/connectors/hive/storage_adapters/s3fs/tests/S3UtilTest.cpp:121' (0x0000006114c0) of size 21
  '.str.132' is ascii string 'localhost,foobar.com'
AddressSanitizer: global-buffer-overflow /velox/velox/connectors/hive/storage_adapters/s3fs/S3Util.cpp:160:16 in facebook::velox::filesystems::parseAWSStandardRegionName[abi:cxx11](std::basic_string_view<char, std::char_traits<char>>)
Shadow bytes around the buggy address:
```

Reviewed By: pedroerp

Differential Revision: D91278230

fbshipit-source-id: 05283bc8408069fa3f5ab8a7840b2bd0835fa7d6
paul-aiyedun pushed a commit that referenced this pull request Feb 20, 2026
The 4KB metadata buffer limit (kMetaBufSize) was too small for tables
with many columns, where cudf metadata alone can exceed 4KB. Wide
tables would silently truncate metadata, causing data corruption.

Rename kMetaBufSize to kMaxMetaBufSize, increase to 1MB (enough for
~10,000+ columns). Add kMetaHeaderSize constant. Change serialize()
to allocate exact size needed instead of fixed buffer, reducing memory
waste for small metadata. Add const to getSerializedSize(). Update
CudfExchangeSource to use kMaxMetaBufSize for the receive buffer.

Review: @pentschev comment #9
mattgara pushed a commit that referenced this pull request Feb 28, 2026
The 4KB metadata buffer limit (kMetaBufSize) was too small for tables
with many columns, where cudf metadata alone can exceed 4KB. Wide
tables would silently truncate metadata, causing data corruption.

Rename kMetaBufSize to kMaxMetaBufSize, increase to 1MB (enough for
~10,000+ columns). Add kMetaHeaderSize constant. Change serialize()
to allocate exact size needed instead of fixed buffer, reducing memory
waste for small metadata. Add const to getSerializedSize(). Update
CudfExchangeSource to use kMaxMetaBufSize for the receive buffer.

Review: @pentschev comment #9
karthikeyann pushed a commit that referenced this pull request Mar 12, 2026
The 4KB metadata buffer limit (kMetaBufSize) was too small for tables
with many columns, where cudf metadata alone can exceed 4KB. Wide
tables would silently truncate metadata, causing data corruption.

Rename kMetaBufSize to kMaxMetaBufSize, increase to 1MB (enough for
~10,000+ columns). Add kMetaHeaderSize constant. Change serialize()
to allocate exact size needed instead of fixed buffer, reducing memory
waste for small metadata. Add const to getSerializedSize(). Update
CudfExchangeSource to use kMaxMetaBufSize for the receive buffer.

Review: @pentschev comment #9
firestarman pushed a commit to firestarman/velox that referenced this pull request Mar 23, 2026
* feat(cudf): improve Arrow interop for decimals and varbinary

Cherry-picked from 00889a2 (PR facebookincubator#16612 Part 1).
Adds veloxToCudfDataType() preserving DECIMAL scale, extends Arrow bridge
options for varbinary and decimal type width. Merged with ferd-dev's
pinnedToArrowHost optimization.

Made-with: Cursor

* feat(cudf): add decimal expression support and tests

Cherry-picked from 67e400d (PR facebookincubator#16750 Part 2).
Adds CUDA kernels for decimal binary/unary/comparison ops, decimal type
utilities, and extends expression evaluator with decimal-aware binary
operators. Merged with ferd-dev's date_add and row_constructor functions.

Made-with: Cursor

* style: format decimal code (cherry-pick f5bc585)

Made-with: Cursor

* fix: fix after rebase for decimal tests (cherry-pick 564ae5c)

Made-with: Cursor

* feat(cudf): enable GPU ceil() (cherry-pick 2ac50f2)

Made-with: Cursor

* feat(cudf): add decimal sum/avg kernels and aggregation support

Cherry-picked from f834458 (PR facebookincubator#16751 Part 3).
Adds CUDA kernels for decimal SUM/AVG aggregation, decimal-aware
type handling in hash aggregation with veloxToCudfDataType.
Merged with ferd-dev's GpuGuard and pinned memory optimizations.

Made-with: Cursor

* feat(cudf): add TopNRowNumber GPU operator, xxhash64, and code quality fixes

- Add CudfTopNRowNumber operator with GPU-based sort, groupby scan ranking,
  and TopNRowNumber kernel implementation
- Add xxhash64_with_seed function for GPU hash computation
- Unify memory resource usage to cudf::get_current_device_resource_ref()
- Improve expression evaluator with priority-sorted candidate fallback and
  diagnostic logging when all evaluators fail
- Fix indentation in CudfHashJoinProbe::initialize()
- Remove redundant member variable writes in CudfTopNRowNumber constructor
- Correct decimal type handling (veloxToCudfDataType) in NestedLoopJoin
- Enhance AstExpressionUtils with VARCHAR-to-DATE constant folding and
  disambiguated field name resolution for join filters

Made-with: Cursor

* fix(cudf): fix xxhash64 tests, refactor InFunction, add varbinary-as-string export

- Fix xxhash64_with_seed test expected values and add single-string test case
  (verified passing on GPU with cudf::hashing::xxhash_64)
- Refactor IN expression from inline eval to proper InFunction CudfFunction
  class with type-cast support and null handling
- Add exportVarbinaryAsString ArrowOption for cudf interop (cudf doesn't
  support Arrow binary type, export as utf-8 string instead)

Made-with: Cursor

* fix(cudf): handle field name mismatch in n{nodeId}_{colIdx} resolution (query31)

When Gluten's Substrait plan uses different node IDs for the same column
(e.g. expression references n11_1 but runtime schema has n10_1), the cuDF
expression evaluator failed with "Field not found" and crashed the task.

Two-part fix:
1. OperatorAdapters: add allFieldsResolvable() check in
   FilterProjectAdapter::canRunOnGPU() to detect unresolvable field
   references before creating CudfFilterProject. Falls back to CPU
   gracefully instead of crashing.
2. AstExpressionUtils: add colIdx-based fallback in pushExprToTree()
   and addPrecomputeInstruction() that matches n{X}_{Y} fields by their
   _{Y} suffix when exact name lookup fails. Only used when exactly one
   schema column shares the same colIdx suffix.

Includes unit tests for both the successful fallback case and the
no-match case.

Made-with: Cursor

* fix(cudf): add zero-size guards to prevent cudaErrorInvalidValue (query4/6/11/15/19/28/30/74/80)

Add defensive checks for zero-row inputs that trigger invalid CUDA
memory allocations through RMM's device_buffer:

CudfHashJoinProbe:
- Early return in getOutput() when probe table has 0 rows
- Guard rightMatchedFlags_ init for empty build tables (n=0)
- Guard leftSemiProjectJoin for leftNumRows=0
- Guard cudf::sequence calls in rightJoin/fullJoin for n=0
- Skip precomputeSubexpressions for empty right tables

CudfFilterProject:
- Skip project() when filter removes all rows (0-row result)

Made-with: Cursor

* fix(cudf): prevent Jitify error for STRING ops in AST evaluator (query83)

cudf::compute_column / Jitify cannot handle variable-width types (STRING)
in AST operations. Added two layers of protection:

1. isOpAndInputsSupported rejects non-fixed-width inputs, preventing the
   AST/JIT evaluator from claiming support for STRING operations at the
   top level.

2. pushExprToTree routes binary ops, IN, and BETWEEN with variable-width
   inputs to precompute instructions via FunctionExpression, handling the
   nested case (e.g., STRING ops inside AND expressions).

Made-with: Cursor

* fix(cudf): implement DATE→VARCHAR cast and might_contain bloom filter bypass

- CastFunction: handle DATE to VARCHAR using cudf::strings::from_timestamps
  with "%Y-%m-%d" format instead of unsupported cudf::cast for this type pair
- MightContainFunction: return all-true boolean column as safe placeholder for
  bloom filter probabilistic check (false positives are acceptable by design)
- Update FunctionExpression::canEvaluate and create to route both expressions

Fixes query83 (DATE cast), query11/15/74/4/80 (might_contain).

Made-with: Cursor

* fix(cudf): implement isnull/isnotnull expression for GPU evaluation (query44/76)

IsNullFunction checks input column null mask using cudf::replace_nulls
and cudf::unary_operation(NOT). Returns all-false/true constant when
column has no nulls. Handles both isnull (negate=false) and isnotnull
(negate=true) variants.

Made-with: Cursor

* fix(cudf): add field resolution check to CudfHashJoinBaseAdapter (query31 revisit)

The previous fix (fe2bf9659) only added allFieldsResolvable check to
FilterProjectAdapter. The actual crash was in CudfHashJoinProbe where
precomputeSubexpressions evaluated a FunctionExpression with a per-side
schema missing the n11_1 field.

Add allFieldsResolvable validation to CudfHashJoinBaseAdapter::canRunOnGPU
which checks the join filter's field references against the combined
left+right schema. If any field is unresolvable, fall back to CPU.

Made-with: Cursor

* fix(cudf): add equalto/notequalto aliases for comparison operators

Spark uses 'equalto' and 'notequalto' as expression names but cudf
only registered 'equal'/'eq' and 'notequal'/'neq'. Add the missing
aliases so these expressions are routed to GPU binary operators.

Fixes query1, query71, query72 and other queries using equalto in
filter pushdown expressions.

Made-with: Cursor

* fix(cudf): add kIntermediate step for mean aggregation in doReduce (query28)

MeanAggregator::doReduce was missing the kIntermediate case, causing
"Unsupported aggregation step for mean" when the query plan has a
partial → intermediate → final aggregation pipeline.

The intermediate step receives a struct(sum, count) from the previous
stage, re-reduces (sum-of-sums, sum-of-counts), and outputs a new
struct for the next stage.

Made-with: Cursor

* fix(cudf): skip STRING subfield filters from AST/Jitify in TableScan (query33/43/56/60/61)

STRING/Bytes filter types (kBytesValues, kNegatedBytesValues, kBytesRange)
cannot be evaluated by cudf AST or Jitify, causing "Jitify fatal error:
Deserialization failed" / "Uninitialized" crashes in TableScan.

Mark these filter kinds as VELOX_NYI in createAstFromSubfieldFilter so they
are gracefully skipped. createAstFromSubfieldFilters now catches per-filter
exceptions and continues with remaining (non-STRING) filters. If all filters
are unsupported, CudfHiveDataSource catches the exception and leaves
subfieldFilterExpr_ null, relying on the downstream FilterProject to apply
the predicates on CPU.

Made-with: Cursor

* fix(cudf): prevent SIGSEGV in hash join probe for skewed partitions (query68)

Convert GPU crashes (SIGSEGV/exit code 139) into clean VeloxRuntimeError
exceptions that enable CPU fallback:

- Wrap inner_join() call in try-catch to convert CUDA/RMM allocation
  failures into diagnostic Velox errors with probe/build row counts
- Add VELOX_CHECK_LE on join output size vs cudf::size_type max to
  detect oversized join results before gather operations
- Wrap gather/filter phase in try-catch for the same reason
- Guard rightTables[0] access with empty-check in isBlocked()
- Guard rightTables[0] access with empty-check in rightSemiFilterJoin()

Root cause: partition 2 of stage 25 has extreme data skew, producing a
massive join output that exhausts GPU memory or overflows size_type.
In v11 this failed cleanly (cudaErrorInvalidValue); in v12 the
zero-size guards let execution proceed deeper, converting the failure
into an unrecoverable SIGSEGV.

Made-with: Cursor (Agent D)

* fix(cudf): lazy table_view construction to prevent column size mismatch (query18)

ASTExpression::eval and JitExpression::eval eagerly constructed a
cudf::table_view from input columns + precomputed columns, even when
the root AST node was a column_reference that never used the table_view.
Precomputed columns from nested evaluators may have different row counts
(e.g., after filtering), triggering a spurious Column size mismatch error.

Fix: defer table_view construction into the else branch where
compute_column actually needs it. Also add output column size validation
in CudfFilterProject::getOutput as a safety net.

Made-with: Cursor

* fix(cudf): graceful AST filter fallback + scalar precompute expansion

CudfHashJoinProbe: wrap AST tree creation in try-catch so unsupported
filter expressions (e.g. unresolvable fields) disable the AST filter
instead of crashing the operator.

AstExpressionUtils: expand 0/1-row precomputed scalar sub-expressions
to match input row count, preventing "Column size mismatch" when
constant expressions produce fewer rows than the input table.

Made-with: Cursor

* fix(cudf): validate all field references in precompute instructions (query4/11)

When pushExprToTree creates a precompute instruction for a sub-expression,
findExpressionSide may select a side whose schema does not contain all the
expression's field references (e.g. n8_1 missing from probe side n7_*).
This creates a FunctionExpression with an incomplete schema that crashes at
eval time in precomputeSubexpressions (Field not found).

Fix: In both precomputeVarWidthOp and the general precompute path, validate
that ALL fields in the expression exist in the chosen side's schema. If not,
return nullptr (varwidth) or throw VELOX_FAIL (general), causing createAstTree
to fail at init time where the Leader's try-catch sets useAstFilter_=false
and falls back to the combined-schema filterEvaluator_.

Made-with: Cursor

* fix(cudf): OOM-resilient probe splitting for hash join (query72/73/76)

When a hash join OOM occurs (e.g. query72 inner_join needing 6.5GB for
723K×1.9M row cardinality explosion), the probe table is split in half
and each half is retried independently. This iterates until the join
fits in GPU memory or hits kMinSplitRows (1024).

Changes:
- innerJoin: re-throw std::bad_alloc so it propagates to the retry loop
  instead of being converted to VeloxRuntimeError
- getOutput(): wrap the join dispatch in an iterative probe-splitting
  retry loop; on std::bad_alloc, split via cudf::split and re-enqueue
- Only split for join types where it is semantics-preserving: Inner,
  Left, LeftSemiFilter, LeftSemiProject, Anti
- Also applies Leader's precompute try-catch hardening to leftJoin
  (mirrors the existing innerJoin pattern)

Made-with: Cursor

* fix(cudf): add runtime Jitify error catch in CudfHiveDataSource compute_column

Defense-in-depth for Issue 3 (Jitify Deserialization in TableScan).
When cudf::compute_column throws a Jitify error (or any exception) during
subfield filter evaluation in the experimental reader path, catch it and
return the unfiltered data instead of crashing the task. The downstream
FilterProject will handle the filtering on CPU.

This complements b7ed2a6dc which prevents STRING filters from reaching
Jitify at init time.

Made-with: Cursor

* fix(cudf): prevent cudf::cast on STRING columns in InFunction (query83)

InFunction::shouldSkipCast incorrectly skipped DATE→STRING casts because
cudf::is_supported_cast returns false for that pair, even though CastFunction
handles it via cudf::strings::from_timestamps. This caused buildHaystackColumn
to attempt cudf::cast(STRING, TIMESTAMP_DAYS) which fails at cast_ops.cu:411
("Column type must be numeric or chrono or decimal32/64/128").

Fix: 1) Return false from shouldSkipCast for DATE→STRING since CastFunction
supports it. 2) Add safety net in buildHaystackColumn: when haystack is STRING
and target is a timestamp type, use cudf::strings::to_timestamps instead of
cudf::cast.

Made-with: Cursor

* fix(cudf): add SIGSEGV signal handler and CUDA error synchronization at operator boundaries

Agent D's try-catch fix (b7fcd02d4) didn't work because SIGSEGV is a
signal, not a C++ exception.  v14 still crashes on query18/29/45/68.

Root cause analysis of v14 SIGSEGV crashes:
- query68: single partition crash (~86s), data skew pattern unchanged
- query29: v12 PASS(8s) → v14 FAIL, ALL tasks crash simultaneously
- query45: v12 PASS(6s) → v14 FAIL, ALL executors crash within 3s
- query18: ALL 28 partitions of stage 33 crash on ALL executors
- All crashes show ~86s delay, consistent with asynchronous CUDA error
  propagation cascading into SIGSEGV

Two-pronged fix:

1. Fatal signal handler (Utilities.cpp, ToCudf.cpp):
   Install SIGSEGV/SIGABRT/SIGBUS handler in registerCudf() that
   captures native backtrace via backtrace_symbols_fd() before
   re-raising to the JVM's handler.  This gives us the exact crash
   location in executor stderr for ALL future SIGSEGV queries.

2. CUDA stream sync + error checking (operator boundaries):
   Add checkCudaOperationError() that synchronizes the CUDA stream
   and checks cudaPeekAtLastError() before each operator's main GPU
   work.  This converts asynchronous CUDA errors (illegal address,
   OOM) into clean VeloxRuntimeError before they cascade into SIGSEGV.
   Added to: CudfFilterProject, CudfHashJoinBuild, CudfHashJoinProbe,
   CudfHashAggregation, CudfToVelox.

Also fixes: pessimizing-move warning in CudfHiveDataSource.cpp.

Made-with: Cursor

* Revert "fix(cudf): add SIGSEGV signal handler and CUDA error synchronization at operator boundaries"

This reverts commit d92407f679707b38746b068daa3445bfd29ac26c.

* fix(cudf): skip subfield filter AST when any column is STRING/VARBINARY

The normal parquet reader path (splitReader_->read_chunk()) applies the
subfield filter AST internally via cudf::compute_column, which uses Jitify
for string comparisons. When Jitify fails, the error is thrown inside the
reader and cannot be caught externally.

Fix: before building the subfield filter AST, check column types of all
subfield filters. If any column is VARCHAR or VARBINARY, skip the entire
GPU subfield filtering (set subfieldFilterExpr_ = nullptr). The downstream
CPU FilterProject handles these predicates instead.

This prevents Jitify errors in query8/33/43/56/60/61/91 by ensuring
string-typed filters never reach the parquet reader's AST evaluator.

Made-with: Cursor

* fix(cudf): clear CUDA error state after OOM in hash join probe (Q72)

After a CUDA OOM in CudfHashJoinProbe, the cuda_async_view_memory_resource
pool enters an error state where ALL subsequent allocations fail with
cudaErrorIllegalAddress — even 1-byte allocations. This cascade kills every
concurrent task on the same GPU.

Fix: call stream.synchronize_no_throw() + cudaGetLastError() in the
std::bad_alloc catch block to clear the sticky CUDA error before the
split-and-retry logic. Without this, probe splitting was never effective
because retried allocations immediately failed on the corrupted pool.

Also includes Agent B's pending improvements: pendingJoinOutputs_ streaming
(avoids concatenation peak memory), empty table filtering.

Made-with: Cursor

* Revert "fix(cudf): clear CUDA error state after OOM in hash join probe (Q72)"

This reverts commit a95d6dd.

* fix(cudf): batched join output + CUDA error recovery for OOM (query72)

Two root causes identified for Q72 CUDA OOM surviving probe splitting:

1. After std::bad_alloc from RMM, the CUDA async pool leaves a sticky
   error on the device. The subsequent cudf::split (which launches a
   null-count kernel) fails with cudaErrorIllegalAddress, escaping the
   catch handler entirely. Fix: call stream.synchronize_no_throw() and
   cudaGetLastError() to clear the sticky error before splitting.

2. Even when splitting succeeds, concatenateTables at the end allocates
   a contiguous buffer for ALL split results (~667MB for 112M rows),
   doubling peak GPU memory. Fix: adopt Spark RAPIDS JoinGatherer
   pattern — store multiple results in pendingJoinOutputs_ and return
   one per getOutput() call, avoiding the concatenation allocation.

Also fix isFinished() to not report done while pendingJoinOutputs_
still has tables to emit.

Made-with: Cursor

* fix(cudf): reject AST expressions with non-matching operand types

cuDF's AST expression parser requires all operands of a binary operation
to have identical cudf::data_type (including decimal scale). When operand
types differ (e.g., DECIMAL(15,2) vs DECIMAL(7,4)), cudf::compute_column
throws "non-matching operand types" at runtime, crashing queries Q61/Q83.

Three-layer fix:
1. isOpAndInputsSupported: reject binary ops with mismatched types early,
   preventing ASTExpression from being selected (falls to FunctionExpression)
2. pushExprToTree: for binary/between ops with type mismatch, try precompute
   via FunctionExpression; if that fails, throw to trigger full fallback
3. ASTExpression::eval: defensive catch around cudf::compute_column converts
   cuDF exceptions to VeloxException for clearer diagnostics

Made-with: Cursor

* fix(cudf): remove duplicate divide/greaterthan registration that drops decimal signatures (Q18)

The `divide` and `greaterthan` functions were registered twice:
first with only (double,double) signatures, then via registerBinaryOp/
registerComparisonOp with both double AND decimal signatures.  Since
overwrite=false, the second registration was silently dropped, meaning
decimal divide and decimal greater-than never matched in canEvaluate.

Also wrap canBeEvaluatedByCudf expression compilation in try-catch
so that VeloxException from decimal type resolution (e.g. "Variable
a_precision is not defined") gracefully falls back to CPU instead of
potentially crashing the task.

Made-with: Cursor

* fix(cudf): disable JIT by default, fix JIT table view and SubfieldFilter isNull param

- Disable jitExpressionEnabled by default (less mature than AST/Function)
- Fix JitExpression to construct astInputTableView from inputColumnViews +
  precomputedColumns before passing to compute_column_jit
- Add missing isNull parameter to makeScalarAndLiteral calls in
  SubfieldFiltersToAst.cpp

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working DO NOT MERGE Hold off on merging; see PR for details

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant