add specialized InList implementations for common scalar types #18832

adriangb · 2025-11-20T00:07:54Z

Copilot

Pull Request Overview

This PR adds specialized StaticFilter implementations for common scalar types to optimize IN LIST operations in DataFusion. Previously, only Int32 had a specialized filter; now Int8, Int16, Int64, UInt8, UInt16, UInt32, UInt64, Boolean, Utf8, LargeUtf8, Utf8View, Binary, LargeBinary, and BinaryView all have optimized implementations.

Key changes:

Introduced two macros (primitive_static_filter! and define_static_filter!) to generate specialized filter implementations, eliminating code duplication
Extended instantiate_static_filter to route 15 additional data types to their specialized implementations
Refactored the in_list function to use instantiate_static_filter instead of defaulting to the generic ArrayStaticFilter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

adriangb · 2025-11-20T00:18:18Z

@alamb maybe let's run benchmarks?

adriangb · 2025-11-20T07:58:42Z

Here's what I'm seeing so far:

Function & State	main (us)	specialized (us)	Change
in_list_utf8(5) (1024, 0) IN (1, 0)	3.6205	4.8594	Regressed (+34.239%)
in_list_utf8(10) (1024, 0) IN (1, 0)	3.6363	5.1001	Regressed (+40.312%)
in_list_utf8(20) (1024, 0) IN (1, 0)	3.6707	4.9045	Regressed (+33.600%)
in_list_f32 (1024, 0) IN (1, 0)	3.5255	3.2288	Improved (−8.2971%)
in_list_i32 (1024, 0) IN (1, 0)	3.4923	3.6403	Regressed (+3.8772%)
in_list_utf8(5) (1024, 0.2) IN (1, 0)	4.9284	6.3583	Regressed (+29.002%)
in_list_utf8(10) (1024, 0.2) IN (1, 0)	4.9788	6.3273	Regressed (+27.064%)
in_list_utf8(20) (1024, 0.2) IN (1, 0)	5.1145	6.4905	Regressed (+26.907%)
in_list_f32 (1024, 0.2) IN (1, 0)	4.4894	4.2540	Improved (−5.2512%)
in_list_i32 (1024, 0.2) IN (1, 0)	4.4055	4.2869	Improved (−2.7526%)
in_list_utf8(5) (1024, 0) IN (3, 0)	3.5576	4.8733	Regressed (+37.000%)
in_list_utf8(10) (1024, 0) IN (3, 0)	3.4999	4.8737	Regressed (+37.108%)
in_list_utf8(20) (1024, 0) IN (3, 0)	3.4897	4.9039	Regressed (+40.449%)
in_list_f32 (1024, 0) IN (3, 0)	3.5114	3.2550	Improved (−7.3270%)
in_list_i32 (1024, 0) IN (3, 0)	3.5199	3.4879	Noise Threshold (−0.9158%)
in_list_utf8(5) (1024, 0.2) IN (3, 0)	5.0196	6.2232	Regressed (+23.865%)
in_list_utf8(10) (1024, 0.2) IN (3, 0)	5.0453	6.5280	Regressed (+28.004%)
in_list_utf8(20) (1024, 0.2) IN (3, 0)	5.1005	11.124	Regressed (+118.13%)
in_list_f32 (1024, 0.2) IN (3, 0)	4.5049	4.3126	Improved (−4.6367%)
in_list_i32 (1024, 0.2) IN (3, 0)	4.4026	4.4424	Noise Threshold (+1.0449%)
in_list_utf8(5) (1024, 0) IN (10, 0)	3.5275	4.9709	Regressed (+40.969%)
in_list_utf8(10) (1024, 0) IN (10, 0)	3.5613	4.9125	Regressed (+37.386%)
in_list_utf8(20) (1024, 0) IN (10, 0)	3.5589	4.8611	Regressed (+36.578%)
in_list_f32 (1024, 0) IN (10, 0)	3.5596	3.2017	Improved (−10.003%)
in_list_i32 (1024, 0) IN (10, 0)	3.4917	3.5190	Noise Threshold (+1.0216%)
in_list_utf8(5) (1024, 0.2) IN (10, 0)	5.0368	6.6162	Regressed (+31.725%)
in_list_utf8(10) (1024, 0.2) IN (10, 0)	5.0980	6.6660	Regressed (+30.039%)
in_list_utf8(20) (1024, 0.2) IN (10, 0)	5.2543	6.5350	Regressed (+24.335%)
in_list_f32 (1024, 0.2) IN (10, 0)	4.5609	4.3248	Improved (−5.3746%)
in_list_i32 (1024, 0.2) IN (10, 0)	4.4460	4.3543	Improved (−2.7754%)
in_list_utf8(5) (1024, 0) IN (100, 0)	3.6542	4.9992	Regressed (+36.952%)
in_list_utf8(10) (1024, 0) IN (100, 0)	3.6529	4.8772	Regressed (+33.560%)
in_list_utf8(20) (1024, 0) IN (100, 0)	3.6155	5.0250	Regressed (+39.127%)
in_list_f32 (1024, 0) IN (100, 0)	3.6029	3.2448	Improved (−9.7603%)
in_list_i32 (1024, 0) IN (100, 0)	3.6048	3.4770	Improved (−3.5307%)
in_list_utf8(5) (1024, 0.2) IN (100, 0)	5.3988	6.6108	Regressed (+22.866%)
in_list_utf8(10) (1024, 0.2) IN (100, 0)	5.4776	6.6567	Regressed (+21.591%)
in_list_utf8(20) (1024, 0.2) IN (100, 0)	5.3721	6.6470	Regressed (+24.058%)
in_list_f32 (1024, 0.2) IN (100, 0)	4.7693	4.3281	Improved (−9.9814%)
in_list_i32 (1024, 0.2) IN (100, 0)	4.5402	4.4076	Improved (−3.5247%)

I think we'd need to add benchmarks for other primitive types.
And it's interesting that Utf8 regresses a lot across the board. I guess the benefits of vectorization / computing the entire hashes at once outweighs the dynamic dispatch?

adriangb · 2025-11-20T08:05:19Z

Thinking about it the trick is probably to avoid the extra to_string()/to_vec() allocation and store just the hashes. Will look into that.

martin-g

LGTM!
Just some nits.

datafusion/physical-expr/src/expressions/in_list.rs

Dandandan · 2025-11-21T11:48:21Z

datafusion/physical-expr/src/expressions/in_list.rs

+                    }
+                    (false, false, false) => {
+                        // no nulls anywhere, not negated
+                        BooleanArray::from_iter(


BooleanBuffer::collect_bool is faster

Thank you I know you or some other reviewer had pointed this out to me before. I am making a mental note to try to not forget again and keep an eye out for it. Thanks for you patience.

I do wonder why we don't have faster high-level APIs if this is really important. E.g. BooleanArray::new_false, BooleanArray::new_nulls, BooleanArray::new_true and BooleanArray::collect_bool(size, iterator) or something like that.

We have been discussing various improvements:

Improvements to BooleanBufferBuilder / BooleanBuilder arrow-rs#8561

Consolidate bitwise operation implementations arrow-rs#8806

alamb · 2025-11-22T12:34:28Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing specialize (1e4782f) to 7d8b860 diff
BENCH_NAME=in_list
BENCH_COMMAND=cargo bench --bench in_list
BENCH_FILTER=
BENCH_BRANCH_NAME=specialize
Results will be posted here when complete

alamb · 2025-11-22T12:53:20Z

🤖: Benchmark completed

Details

group                                       main                                   specialize
-----                                       ----                                   ----------
in_list_f32 (1024, 0) IN (1, 0)             1.16      4.2±0.01µs        ? ?/sec    1.00      3.6±0.02µs        ? ?/sec
in_list_f32 (1024, 0) IN (10, 0)            1.16      4.2±0.01µs        ? ?/sec    1.00      3.6±0.02µs        ? ?/sec
in_list_f32 (1024, 0) IN (100, 0)           1.16      4.3±0.01µs        ? ?/sec    1.00      3.7±0.01µs        ? ?/sec
in_list_f32 (1024, 0) IN (3, 0)             1.17      4.3±0.10µs        ? ?/sec    1.00      3.6±0.01µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (1, 0)           1.00      5.1±0.02µs        ? ?/sec    1.13      5.8±0.07µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (10, 0)          1.00      5.3±0.03µs        ? ?/sec    1.09      5.7±0.02µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (100, 0)         1.00      5.3±0.04µs        ? ?/sec    1.13      5.9±0.10µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (3, 0)           1.00      5.2±0.04µs        ? ?/sec    1.12      5.8±0.05µs        ? ?/sec
in_list_i32 (1024, 0) IN (1, 0)             1.00      4.3±0.01µs        ? ?/sec    1.11      4.7±0.02µs        ? ?/sec
in_list_i32 (1024, 0) IN (10, 0)            1.00      4.2±0.01µs        ? ?/sec    1.11      4.7±0.04µs        ? ?/sec
in_list_i32 (1024, 0) IN (100, 0)           1.00      4.3±0.01µs        ? ?/sec    1.11      4.7±0.01µs        ? ?/sec
in_list_i32 (1024, 0) IN (3, 0)             1.00      4.3±0.01µs        ? ?/sec    1.11      4.7±0.07µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (1, 0)           1.00      5.2±0.02µs        ? ?/sec    1.32      6.8±0.39µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (10, 0)          1.00      5.2±0.02µs        ? ?/sec    1.25      6.5±0.03µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (100, 0)         1.00      5.2±0.03µs        ? ?/sec    1.16      6.1±0.06µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (3, 0)           1.00      5.2±0.02µs        ? ?/sec    1.30      6.8±0.03µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (1, 0)        1.16      4.5±0.02µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (10, 0)       1.16      4.5±0.01µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (100, 0)      1.16      4.5±0.03µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (3, 0)        1.16      4.5±0.04µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (1, 0)      1.06      5.9±0.03µs        ? ?/sec    1.00      5.6±0.04µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (10, 0)     1.09      6.0±0.06µs        ? ?/sec    1.00      5.5±0.03µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (100, 0)    1.07      6.0±0.03µs        ? ?/sec    1.00      5.6±0.03µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (3, 0)      1.10      6.1±0.04µs        ? ?/sec    1.00      5.6±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (1, 0)        1.16      4.5±0.02µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (10, 0)       1.16      4.5±0.01µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (100, 0)      1.15      4.5±0.04µs        ? ?/sec    1.00      3.9±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (3, 0)        1.14      4.6±0.01µs        ? ?/sec    1.00      4.0±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (1, 0)      1.06      6.0±0.02µs        ? ?/sec    1.00      5.7±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (10, 0)     1.07      6.0±0.03µs        ? ?/sec    1.00      5.6±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (100, 0)    1.09      6.1±0.03µs        ? ?/sec    1.00      5.6±0.02µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (3, 0)      1.04      6.1±0.03µs        ? ?/sec    1.00      5.8±0.05µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (1, 0)         1.15      4.5±0.01µs        ? ?/sec    1.00      4.0±0.01µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (10, 0)        1.16      4.5±0.03µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (100, 0)       1.16      4.5±0.01µs        ? ?/sec    1.00      3.9±0.00µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (3, 0)         1.14      4.5±0.03µs        ? ?/sec    1.00      4.0±0.02µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (1, 0)       1.06      5.9±0.04µs        ? ?/sec    1.00      5.5±0.04µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (10, 0)      1.10      6.1±0.04µs        ? ?/sec    1.00      5.5±0.03µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (100, 0)     1.11      6.0±0.03µs        ? ?/sec    1.00      5.4±0.03µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (3, 0)       1.11      6.1±0.06µs        ? ?/sec    1.00      5.5±0.04µs        ? ?/sec

alamb

Thanks @adriangb -- this seems like an improvement to me

It would be nice if we could reduce some of the duplication in the tests, but I don't think that is a deal breaker 👍

I do think we should cover the no null cases with tests

Do you also plan to make special InList implementation for Utf8/Utf8View/LargeUtf8?

alamb · 2025-11-22T10:58:43Z

datafusion/physical-expr/src/expressions/in_list.rs

    }

+    #[test]
+    fn in_list_int8() -> Result<()> {


Can we please reduce the duplication in tests here? It seems like we there are like 16 copies of the same test

Reducing the duplication will make it easier to understand what is being covered

alamb · 2025-11-22T11:04:06Z

datafusion/physical-expr/src/expressions/in_list.rs

+                        BooleanArray::new(builder.finish(), None)
+                    }
+                    (false, false, true) => {
+                        let values = v.values();


This code appears to be uncovered by tests. I tested using

cargo llvm-cov test --html -p datafusion-physical-expr --lib -- in_lis

Here is the whole report in case that is useful llvm-cov.zip

alamb · 2025-11-22T11:05:28Z

datafusion/physical-expr/src/expressions/in_list.rs

-    }
+            fn contains(&self, v: &dyn Array, negated: bool) -> Result<BooleanArray> {
+                // Handle dictionary arrays by recursing on the values
+                downcast_dictionary_array! {


I didn't see any tests for dictionaries 🤔

alamb · 2025-11-22T12:36:51Z

datafusion/physical-expr/src/expressions/in_list.rs

+                    }
+                    (false, false, false) => {
+                        // no nulls anywhere, not negated
+                        BooleanArray::from_iter(


We have been discussing various improvements:

Improvements to BooleanBufferBuilder / BooleanBuilder arrow-rs#8561

Consolidate bitwise operation implementations arrow-rs#8806

alamb · 2025-11-22T13:11:53Z

datafusion/physical-expr/src/expressions/in_list.rs

+    }
+
+    #[test]
+    fn in_list_utf8_view() -> Result<()> {


this PR has tests for utf8 but no changes for those types. Is that your intention?

Dandandan · 2025-11-23T06:03:27Z

datafusion/physical-expr/src/expressions/in_list.rs

+                    }
+                    (true, _, true) | (false, true, true) => {
+                        // Either needle or haystack has nulls, negated
+                        BooleanArray::from_iter(v.iter().map(|value| {


It probably would be faster to handle the nulls separately or using set_indices rather than using BooleanArray::from_iter and v.iter etc.

Dandandan · 2025-11-23T06:04:59Z

datafusion/physical-expr/src/expressions/in_list.rs

+                        let values = v.values();
+                        let mut builder = BooleanBufferBuilder::new(values.len());
+                        for value in values.iter() {
+                            builder.append(self.values.contains(value));


This unfortunately is slower than collect_bool. I see there is some good discussion on better APIs on apache/arrow-rs#8561

Dandandan

Nice to get some performance back.

Dandandan · 2025-11-23T06:15:10Z

🤖: Benchmark completed

Details

group                                       main                                   specialize
-----                                       ----                                   ----------
in_list_f32 (1024, 0) IN (1, 0)             1.16      4.2±0.01µs        ? ?/sec    1.00      3.6±0.02µs        ? ?/sec
in_list_f32 (1024, 0) IN (10, 0)            1.16      4.2±0.01µs        ? ?/sec    1.00      3.6±0.02µs        ? ?/sec
in_list_f32 (1024, 0) IN (100, 0)           1.16      4.3±0.01µs        ? ?/sec    1.00      3.7±0.01µs        ? ?/sec
in_list_f32 (1024, 0) IN (3, 0)             1.17      4.3±0.10µs        ? ?/sec    1.00      3.6±0.01µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (1, 0)           1.00      5.1±0.02µs        ? ?/sec    1.13      5.8±0.07µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (10, 0)          1.00      5.3±0.03µs        ? ?/sec    1.09      5.7±0.02µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (100, 0)         1.00      5.3±0.04µs        ? ?/sec    1.13      5.9±0.10µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (3, 0)           1.00      5.2±0.04µs        ? ?/sec    1.12      5.8±0.05µs        ? ?/sec
in_list_i32 (1024, 0) IN (1, 0)             1.00      4.3±0.01µs        ? ?/sec    1.11      4.7±0.02µs        ? ?/sec
in_list_i32 (1024, 0) IN (10, 0)            1.00      4.2±0.01µs        ? ?/sec    1.11      4.7±0.04µs        ? ?/sec
in_list_i32 (1024, 0) IN (100, 0)           1.00      4.3±0.01µs        ? ?/sec    1.11      4.7±0.01µs        ? ?/sec
in_list_i32 (1024, 0) IN (3, 0)             1.00      4.3±0.01µs        ? ?/sec    1.11      4.7±0.07µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (1, 0)           1.00      5.2±0.02µs        ? ?/sec    1.32      6.8±0.39µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (10, 0)          1.00      5.2±0.02µs        ? ?/sec    1.25      6.5±0.03µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (100, 0)         1.00      5.2±0.03µs        ? ?/sec    1.16      6.1±0.06µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (3, 0)           1.00      5.2±0.02µs        ? ?/sec    1.30      6.8±0.03µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (1, 0)        1.16      4.5±0.02µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (10, 0)       1.16      4.5±0.01µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (100, 0)      1.16      4.5±0.03µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (3, 0)        1.16      4.5±0.04µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (1, 0)      1.06      5.9±0.03µs        ? ?/sec    1.00      5.6±0.04µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (10, 0)     1.09      6.0±0.06µs        ? ?/sec    1.00      5.5±0.03µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (100, 0)    1.07      6.0±0.03µs        ? ?/sec    1.00      5.6±0.03µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (3, 0)      1.10      6.1±0.04µs        ? ?/sec    1.00      5.6±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (1, 0)        1.16      4.5±0.02µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (10, 0)       1.16      4.5±0.01µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (100, 0)      1.15      4.5±0.04µs        ? ?/sec    1.00      3.9±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (3, 0)        1.14      4.6±0.01µs        ? ?/sec    1.00      4.0±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (1, 0)      1.06      6.0±0.02µs        ? ?/sec    1.00      5.7±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (10, 0)     1.07      6.0±0.03µs        ? ?/sec    1.00      5.6±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (100, 0)    1.09      6.1±0.03µs        ? ?/sec    1.00      5.6±0.02µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (3, 0)      1.04      6.1±0.03µs        ? ?/sec    1.00      5.8±0.05µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (1, 0)         1.15      4.5±0.01µs        ? ?/sec    1.00      4.0±0.01µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (10, 0)        1.16      4.5±0.03µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (100, 0)       1.16      4.5±0.01µs        ? ?/sec    1.00      3.9±0.00µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (3, 0)         1.14      4.5±0.03µs        ? ?/sec    1.00      4.0±0.02µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (1, 0)       1.06      5.9±0.04µs        ? ?/sec    1.00      5.5±0.04µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (10, 0)      1.10      6.1±0.04µs        ? ?/sec    1.00      5.5±0.03µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (100, 0)     1.11      6.0±0.03µs        ? ?/sec    1.00      5.4±0.03µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (3, 0)       1.11      6.1±0.06µs        ? ?/sec    1.00      5.5±0.04µs        ? ?/sec

Results seem a bit mixed?

adriangb · 2025-11-23T06:43:54Z

Yes. Slowdowns for i32 are concerning. I won’t merge this until it’s all speedups or neutral. I may also make a support PR to add more benchmarks for other types so we can make better comparisons.

alamb · 2025-12-02T12:01:40Z

I am hoping to look at this PR today or tomorrow and help push it along

adriangb · 2025-12-02T18:14:02Z

I realized this new implementation differs from the existing one in ways which fix some bugs. I opened #19050 to demonstrate those bugs. So maybe lets first fix the bugs and merge the new tests, then we can continue here?

geoffreyclaude · 2025-12-08T08:08:51Z

datafusion/physical-expr/src/expressions/in_list.rs

+                let haystack_has_nulls = self.null_count > 0;
+
+                let result = match (v.null_count() > 0, haystack_has_nulls, negated) {
+                    (true, _, false) | (false, true, false) => {


You can simplify this slightly complex 6 boolean match with:

let has_nulls = v.null_count() > 0 || haystack_has_nulls; match (has_nulls, negated) { (true, false) => { /* Either needle or haystack has nulls, not negated */ } (true, true) => { /* Either needle or haystack has nulls, negated */ } (false, false) => { /* no nulls anywhere, not negated */ } (false, true) => { /* no nulls anywhere, negated */ } }

Thanks applied in #19050

geoffreyclaude · 2025-12-08T09:12:02Z

General comment on the benchmark but... am I reading them wrong, or is the null_percent input logic inverted?

fn do_benches(
    c: &mut Criterion,
    array_length: usize,
    in_list_length: usize,
    null_percent: f64,
) {
...
    let values: Int32Array = (0..array_length)
        .map(|_| rng.random_bool(null_percent).then(|| rng.random()))
        .collect();

    let in_list: Vec<_> = (0..in_list_length)
        .map(|_| ScalarValue::Int32(Some(rng.random())))
        .collect();

    do_bench(
        c,
        &format!("in_list_i32 ({array_length}, {null_percent}) IN ({in_list_length}, 0)"),
        Arc::new(values),
        &in_list,
    )

When null_percent is 0, won't this just create a array_length long values array of NULLs?

I think it should rather be something like:

fn do_benches(
    c: &mut Criterion,
    array_length: usize,
    in_list_length: usize,
    null_percent: f64,
) {
    let non_null_percent = 1.0 - null_percent;
...
    let values: Int32Array = (0..array_length)
        .map(|_| rng.random_bool(non_null_percent).then(|| rng.random()))
        .collect();

    let in_list: Vec<_> = (0..in_list_length)
        .map(|_| ScalarValue::Int32(Some(rng.random())))
        .collect();

    do_bench(
        c,
        &format!("in_list_i32 ({array_length}, {null_percent}) IN ({in_list_length}, 0)"),
        Arc::new(values),
        &in_list,
    )

EDIT: I opened #19204 to fix this.

alamb-ghbot · 2025-12-10T18:04:58Z

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing specialize (a430ef8) to e8a0829 diff
BENCH_NAME=in_list
BENCH_COMMAND=cargo bench --all-features --bench in_list
BENCH_FILTER=
BENCH_BRANCH_NAME=specialize
Results will be posted here when complete

alamb-ghbot · 2025-12-10T18:27:24Z

🤖: Benchmark completed

Details

group                                          main                                   specialize
-----                                          ----                                   ----------
in_list/Float32/list=100/nulls=0%              1.01    305.6±1.35µs        ? ?/sec    1.00    303.3±1.81µs        ? ?/sec
in_list/Float32/list=100/nulls=20%             1.09   392.5±12.14µs        ? ?/sec    1.00    358.9±2.98µs        ? ?/sec
in_list/Float32/list=3/nulls=0%                1.03     14.7±0.65µs        ? ?/sec    1.00     14.3±0.52µs        ? ?/sec
in_list/Float32/list=3/nulls=20%               1.05     19.7±0.62µs        ? ?/sec    1.00     18.8±0.24µs        ? ?/sec
in_list/Float32/list=8/nulls=0%                1.00     29.0±0.11µs        ? ?/sec    1.02     29.6±0.96µs        ? ?/sec
in_list/Float32/list=8/nulls=20%               1.11     39.2±0.35µs        ? ?/sec    1.00     35.4±0.24µs        ? ?/sec
in_list/Int32/list=100/nulls=0%                1.17      6.0±0.16µs        ? ?/sec    1.00      5.2±0.19µs        ? ?/sec
in_list/Int32/list=100/nulls=20%               1.79      8.4±0.14µs        ? ?/sec    1.00      4.7±0.15µs        ? ?/sec
in_list/Int32/list=3/nulls=0%                  1.44      5.7±0.09µs        ? ?/sec    1.00      3.9±0.12µs        ? ?/sec
in_list/Int32/list=3/nulls=20%                 2.17      8.6±0.19µs        ? ?/sec    1.00      4.0±0.05µs        ? ?/sec
in_list/Int32/list=8/nulls=0%                  1.43      6.0±0.16µs        ? ?/sec    1.00      4.2±0.02µs        ? ?/sec
in_list/Int32/list=8/nulls=20%                 1.94      8.8±0.31µs        ? ?/sec    1.00      4.5±0.16µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=100         1.01   655.6±13.76µs        ? ?/sec    1.00    647.4±7.72µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=12          1.03   682.9±10.42µs        ? ?/sec    1.00    664.6±8.23µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=3           1.00    667.5±2.10µs        ? ?/sec    1.00    669.9±9.19µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=100        1.00    594.5±2.21µs        ? ?/sec    1.02    606.1±4.71µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=12         1.00    617.0±3.86µs        ? ?/sec    1.00    615.1±2.64µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=3          1.03    633.6±7.70µs        ? ?/sec    1.00    617.9±5.61µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=100           1.00     25.5±0.18µs        ? ?/sec    1.04     26.5±0.60µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=12            1.03     26.7±0.79µs        ? ?/sec    1.00     25.9±0.30µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=3             1.00     27.3±0.14µs        ? ?/sec    1.00     27.4±0.61µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=100          1.01     26.0±0.35µs        ? ?/sec    1.00     25.8±0.79µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=12           1.00     27.0±0.59µs        ? ?/sec    1.01     27.2±0.12µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=3            1.03     26.7±0.54µs        ? ?/sec    1.00     26.0±0.28µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=100           1.02     56.1±1.55µs        ? ?/sec    1.00     54.9±0.34µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=12            1.00     55.9±0.30µs        ? ?/sec    1.00     55.6±0.40µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=3             1.00     55.1±0.24µs        ? ?/sec    1.08     59.4±2.05µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=100          1.01     52.5±1.17µs        ? ?/sec    1.00     52.2±0.50µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=12           1.00     54.1±1.20µs        ? ?/sec    1.00     53.8±0.28µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=3            1.02     54.3±1.53µs        ? ?/sec    1.00     53.2±0.51µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=100     1.00    595.6±6.39µs        ? ?/sec    1.05    622.7±4.73µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=12      1.00    578.3±6.18µs        ? ?/sec    1.00    576.7±5.17µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=3       1.00    577.9±4.87µs        ? ?/sec    1.00    575.4±3.80µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=100    1.00   505.1±10.14µs        ? ?/sec    1.04    527.3±1.63µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=12     1.00    509.6±2.37µs        ? ?/sec    1.00    511.4±6.18µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=3      1.00   516.3±16.20µs        ? ?/sec    1.00    516.9±7.63µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=100       1.05     22.2±0.41µs        ? ?/sec    1.00     21.2±0.33µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=12        1.07     23.6±0.47µs        ? ?/sec    1.00     22.2±0.16µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=3         1.04     23.2±0.14µs        ? ?/sec    1.00     22.2±0.21µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=100      1.00     22.8±0.15µs        ? ?/sec    1.00     22.8±0.49µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=12       1.03     23.8±1.22µs        ? ?/sec    1.00     23.0±0.29µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=3        1.01     23.4±0.12µs        ? ?/sec    1.00     23.1±0.40µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=100       1.00     48.8±0.38µs        ? ?/sec    1.01     49.5±1.38µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=12        1.23     61.1±0.51µs        ? ?/sec    1.00     49.7±0.61µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=3         1.23     61.2±0.36µs        ? ?/sec    1.00     49.8±0.62µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=100      1.00     44.9±0.93µs        ? ?/sec    1.02     45.7±0.34µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=12       1.03     47.5±0.91µs        ? ?/sec    1.00     46.3±2.34µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=3        1.03     49.4±0.27µs        ? ?/sec    1.00     48.0±1.63µs        ? ?/sec

adriangb · 2025-12-10T18:52:20Z

Seems reproducible 😄

adriangb · 2025-12-10T19:29:10Z

run benchmark in_list

alamb-ghbot · 2025-12-10T19:29:13Z

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing specialize (2f2c60d) to e8a0829 diff
BENCH_NAME=in_list
BENCH_COMMAND=cargo bench --all-features --bench in_list
BENCH_FILTER=
BENCH_BRANCH_NAME=specialize
Results will be posted here when complete

alamb-ghbot · 2025-12-10T19:51:17Z

🤖: Benchmark completed

Details

group                                          main                                   specialize
-----                                          ----                                   ----------
in_list/Float32/list=100/nulls=0%              51.58   306.7±2.41µs        ? ?/sec    1.00      5.9±0.08µs        ? ?/sec
in_list/Float32/list=100/nulls=20%             55.98   390.8±2.57µs        ? ?/sec    1.00      7.0±0.06µs        ? ?/sec
in_list/Float32/list=3/nulls=0%                2.68     14.6±0.20µs        ? ?/sec    1.00      5.4±0.19µs        ? ?/sec
in_list/Float32/list=3/nulls=20%               3.44     19.8±0.76µs        ? ?/sec    1.00      5.8±0.02µs        ? ?/sec
in_list/Float32/list=8/nulls=0%                5.06     29.0±0.46µs        ? ?/sec    1.00      5.7±0.06µs        ? ?/sec
in_list/Float32/list=8/nulls=20%               6.95     39.2±0.35µs        ? ?/sec    1.00      5.6±0.21µs        ? ?/sec
in_list/Int32/list=100/nulls=0%                1.34      6.0±0.13µs        ? ?/sec    1.00      4.5±0.04µs        ? ?/sec
in_list/Int32/list=100/nulls=20%               1.90      8.2±0.21µs        ? ?/sec    1.00      4.3±0.13µs        ? ?/sec
in_list/Int32/list=3/nulls=0%                  1.43      6.0±0.30µs        ? ?/sec    1.00      4.2±0.03µs        ? ?/sec
in_list/Int32/list=3/nulls=20%                 2.15      8.6±0.04µs        ? ?/sec    1.00      4.0±0.01µs        ? ?/sec
in_list/Int32/list=8/nulls=0%                  1.45      6.0±0.13µs        ? ?/sec    1.00      4.1±0.03µs        ? ?/sec
in_list/Int32/list=8/nulls=20%                 1.91      8.4±0.26µs        ? ?/sec    1.00      4.4±0.05µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=100         1.00   642.9±18.69µs        ? ?/sec    1.00   643.3±15.52µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=12          1.06   711.3±40.52µs        ? ?/sec    1.00    669.0±5.52µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=3           1.04   699.0±45.42µs        ? ?/sec    1.00   672.0±13.27µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=100        1.00    596.0±5.59µs        ? ?/sec    1.02   606.4±27.65µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=12         1.01    624.1±5.11µs        ? ?/sec    1.00   618.7±19.97µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=3          1.00    621.0±4.63µs        ? ?/sec    1.00   619.1±14.66µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=100           1.08     28.3±1.39µs        ? ?/sec    1.00     26.2±0.23µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=12            1.06     27.4±0.06µs        ? ?/sec    1.00     25.8±0.30µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=3             1.09     29.6±2.53µs        ? ?/sec    1.00     27.2±1.82µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=100          1.01     26.6±0.32µs        ? ?/sec    1.00     26.4±0.19µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=12           1.06     28.6±1.18µs        ? ?/sec    1.00     27.1±0.34µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=3            1.02     27.6±0.61µs        ? ?/sec    1.00     27.2±0.36µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=100           1.05     58.5±3.52µs        ? ?/sec    1.00     55.8±0.98µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=12            1.11     63.7±0.38µs        ? ?/sec    1.00     57.1±0.21µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=3             1.20     69.2±4.84µs        ? ?/sec    1.00     57.5±0.19µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=100          1.00     53.3±0.93µs        ? ?/sec    1.02     54.3±1.38µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=12           1.00     53.4±0.53µs        ? ?/sec    1.01     54.1±1.05µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=3            1.01     55.0±0.67µs        ? ?/sec    1.00     54.6±2.05µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=100     1.00    595.2±7.31µs        ? ?/sec    1.05    623.5±9.42µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=12      1.00   578.4±17.29µs        ? ?/sec    1.00   577.7±11.62µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=3       1.00    578.5±5.77µs        ? ?/sec    1.04  600.3±110.42µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=100    1.00    502.9±4.92µs        ? ?/sec    1.05    527.1±5.38µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=12     1.00    511.5±8.06µs        ? ?/sec    1.00    509.0±2.38µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=3      1.00    516.9±7.43µs        ? ?/sec    1.00    515.2±5.38µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=100       1.04     22.2±0.28µs        ? ?/sec    1.00     21.3±0.80µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=12        1.04     23.3±0.89µs        ? ?/sec    1.00     22.4±0.41µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=3         1.04     23.2±0.20µs        ? ?/sec    1.00     22.2±0.08µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=100      1.01     23.0±0.14µs        ? ?/sec    1.00     22.8±0.83µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=12       1.01     23.4±0.18µs        ? ?/sec    1.00     23.1±1.01µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=3        1.02     23.7±0.17µs        ? ?/sec    1.00     23.1±0.57µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=100       1.00     48.9±1.12µs        ? ?/sec    1.01     49.5±0.84µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=12        1.02     50.6±0.33µs        ? ?/sec    1.00     49.4±0.55µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=3         1.03     50.6±0.68µs        ? ?/sec    1.00     49.4±0.15µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=100      1.00     44.9±0.61µs        ? ?/sec    1.02     45.7±0.52µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=12       1.00     46.1±0.27µs        ? ?/sec    1.00     45.9±0.54µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=3        1.01     48.3±0.30µs        ? ?/sec    1.00     47.8±0.78µs        ? ?/sec

geoffreyclaude · 2025-12-10T20:59:48Z

datafusion/physical-expr/src/expressions/in_list.rs

-            .ok_or_else(|| exec_datafusion_err!("Failed to downcast array"))?;
+impl PartialEq for OrderedFloat32 {
+    fn eq(&self, other: &Self) -> bool {
+        self.0.total_cmp(&other.0).is_eq()


Doesn't equality on floats use to_bits() elsewhere? This could lead to inconsistencies for edge cases (NaN, +0, -0 for instance.)

I added tests for all of those and cross references against postgres/duckdb. But I will try to_bits().

geoffreyclaude · 2025-12-11T14:21:40Z

datafusion/physical-expr/src/expressions/in_list.rs

+                        needle_nulls.cloned()
+                    }
+                    (false, true) => {
+                        // Only haystack has nulls - null where not-in-set


This comment block is a bit verbose and not super clear. Maybe having it in table form would help?

Thanks, this is way more readable!

adriangb · 2025-12-11T19:11:23Z

run benchmark in_list

alamb-ghbot · 2025-12-11T19:11:32Z

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing specialize (b6404e5) to e8a0829 diff
BENCH_NAME=in_list
BENCH_COMMAND=cargo bench --all-features --bench in_list
BENCH_FILTER=
BENCH_BRANCH_NAME=specialize
Results will be posted here when complete

alamb-ghbot · 2025-12-11T19:33:23Z

🤖: Benchmark completed

Details

group                                          main                                   specialize
-----                                          ----                                   ----------
in_list/Float32/list=100/nulls=0%              49.40   304.7±1.58µs        ? ?/sec    1.00      6.2±0.12µs        ? ?/sec
in_list/Float32/list=100/nulls=20%             66.79   390.9±6.08µs        ? ?/sec    1.00      5.9±0.02µs        ? ?/sec
in_list/Float32/list=3/nulls=0%                2.67     14.6±0.14µs        ? ?/sec    1.00      5.5±0.06µs        ? ?/sec
in_list/Float32/list=3/nulls=20%               3.52     20.0±1.00µs        ? ?/sec    1.00      5.7±0.02µs        ? ?/sec
in_list/Float32/list=8/nulls=0%                5.08     29.0±0.16µs        ? ?/sec    1.00      5.7±0.03µs        ? ?/sec
in_list/Float32/list=8/nulls=20%               6.45     39.3±0.85µs        ? ?/sec    1.00      6.1±0.37µs        ? ?/sec
in_list/Int32/list=100/nulls=0%                1.37      5.8±0.03µs        ? ?/sec    1.00      4.3±0.06µs        ? ?/sec
in_list/Int32/list=100/nulls=20%               1.74      8.5±0.11µs        ? ?/sec    1.00      4.9±0.07µs        ? ?/sec
in_list/Int32/list=3/nulls=0%                  1.40      5.6±0.04µs        ? ?/sec    1.00      4.0±0.07µs        ? ?/sec
in_list/Int32/list=3/nulls=20%                 2.22      8.7±0.05µs        ? ?/sec    1.00      3.9±0.02µs        ? ?/sec
in_list/Int32/list=8/nulls=0%                  1.47      6.2±0.10µs        ? ?/sec    1.00      4.2±0.03µs        ? ?/sec
in_list/Int32/list=8/nulls=20%                 1.77      8.4±0.07µs        ? ?/sec    1.00      4.7±0.03µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=100         1.00    641.8±5.57µs        ? ?/sec    1.01   645.3±11.16µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=12          1.00    667.3±3.33µs        ? ?/sec    1.01    673.3±3.46µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=3           1.00   674.1±30.62µs        ? ?/sec    1.01    678.1±6.01µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=100        1.00   596.6±21.34µs        ? ?/sec    1.00    598.5±5.10µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=12         1.00    614.4±6.19µs        ? ?/sec    1.01   622.6±27.76µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=3          1.00    621.9±6.25µs        ? ?/sec    1.00    623.4±3.35µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=100           1.07     28.5±2.11µs        ? ?/sec    1.00     26.7±1.32µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=12            1.00     26.6±0.61µs        ? ?/sec    1.05     27.8±1.36µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=3             1.01     27.3±0.23µs        ? ?/sec    1.00     27.0±0.48µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=100          1.00     25.9±0.40µs        ? ?/sec    1.01     26.2±0.45µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=12           1.00     26.6±0.15µs        ? ?/sec    1.02     27.1±0.35µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=3            1.00     26.7±0.23µs        ? ?/sec    1.01     27.0±0.31µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=100           1.01     55.2±1.71µs        ? ?/sec    1.00     54.5±0.70µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=12            1.00     55.9±0.49µs        ? ?/sec    1.02     57.3±1.16µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=3             1.00     56.4±1.03µs        ? ?/sec    1.03     58.3±1.18µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=100          1.00     52.3±0.42µs        ? ?/sec    1.03     53.9±1.92µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=12           1.00     53.7±0.33µs        ? ?/sec    1.01     54.2±1.13µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=3            1.00     54.3±0.56µs        ? ?/sec    1.01     54.6±0.86µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=100     1.01   597.2±11.74µs        ? ?/sec    1.00    592.2±2.03µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=12      1.00    576.3±6.37µs        ? ?/sec    1.00    576.5±2.52µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=3       1.00   578.2±13.93µs        ? ?/sec    1.00    576.3±6.54µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=100    1.00    502.6±3.13µs        ? ?/sec    1.01    505.2±4.79µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=12     1.00    508.8±2.49µs        ? ?/sec    1.00    508.8±1.82µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=3      1.00    517.0±8.52µs        ? ?/sec    1.00    515.8±6.32µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=100       1.10     22.3±0.48µs        ? ?/sec    1.00     20.2±0.07µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=12        1.05     23.2±0.34µs        ? ?/sec    1.00     22.2±0.41µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=3         1.04     23.2±0.13µs        ? ?/sec    1.00     22.4±0.43µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=100      1.02     22.7±0.12µs        ? ?/sec    1.00     22.3±0.15µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=12       1.01     23.3±0.70µs        ? ?/sec    1.00     23.2±0.25µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=3        1.00     23.5±0.33µs        ? ?/sec    1.00     23.6±0.53µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=100       1.05     48.9±1.88µs        ? ?/sec    1.00     46.8±1.57µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=12        1.03     61.1±0.24µs        ? ?/sec    1.00     59.2±1.56µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=3         1.04     61.2±0.30µs        ? ?/sec    1.00     58.8±0.34µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=100      1.02     45.1±2.00µs        ? ?/sec    1.00     44.2±0.53µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=12       1.00     46.2±1.33µs        ? ?/sec    1.03     47.5±1.98µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=3        1.00     47.8±0.74µs        ? ?/sec    1.04     49.6±1.20µs        ? ?/sec

adriangb · 2025-12-11T19:48:19Z

The benchmarks are now looking very good, and this has gone through several rounds of review, all of it addressed. But there are significant changes since the last approvals. @geoffreyclaude could you give another review and approve if you think it's ready so we can merge this piece and continue work in #19241

geoffreyclaude · 2025-12-11T20:02:19Z

The benchmarks are now looking very good, and this has gone through several rounds of review, all of it addressed. But there are significant changes since the last approvals. @geoffreyclaude could you give another review and approve if you think it's ready so we can merge this piece and continue work in #19241

The tests you added as a dedicated PR give a lot of confidence this isn't introducing any functional regression. And perf-wise, then benchmarks speak for themselves!

Do we have more generic benches that exercise this path? Maybe the Clickbench ones? Would be nice to see the big picture and prove this isn't "benchmaxing" with unexpected adverse effects in real life.

Otherwise, ✅ from me of course. You addressed my remaining two nits already.

adriangb · 2025-12-11T20:04:43Z

I'll kick off some general benchmarks and if those look good and there's no more feedback I'll merge later today or tomorrow morning.

adriangb · 2025-12-11T20:05:32Z

run benchmark clickbench_partitioned tpch

alamb-ghbot · 2025-12-11T20:48:06Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing specialize (b6404e5) to e8a0829 diff using: clickbench_partitioned
Results will be posted here when complete

alamb-ghbot · 2025-12-11T21:18:54Z

🤖: Benchmark completed

Details

Comparing HEAD and specialize
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃  specialize ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.30 ms │     2.63 ms │  1.15x slower │
│ QQuery 1     │    50.35 ms │    48.86 ms │     no change │
│ QQuery 2     │   133.49 ms │   136.16 ms │     no change │
│ QQuery 3     │   161.31 ms │   157.45 ms │     no change │
│ QQuery 4     │  1050.15 ms │  1063.97 ms │     no change │
│ QQuery 5     │  1491.70 ms │  1533.79 ms │     no change │
│ QQuery 6     │     2.32 ms │     2.21 ms │     no change │
│ QQuery 7     │    53.70 ms │    55.25 ms │     no change │
│ QQuery 8     │  1408.68 ms │  1416.73 ms │     no change │
│ QQuery 9     │  1853.09 ms │  1880.40 ms │     no change │
│ QQuery 10    │   375.13 ms │   375.41 ms │     no change │
│ QQuery 11    │   424.63 ms │   427.53 ms │     no change │
│ QQuery 12    │  1346.21 ms │  1362.68 ms │     no change │
│ QQuery 13    │  2015.24 ms │  2015.19 ms │     no change │
│ QQuery 14    │  1266.73 ms │  1283.79 ms │     no change │
│ QQuery 15    │  1245.36 ms │  1231.93 ms │     no change │
│ QQuery 16    │  2639.03 ms │  2674.74 ms │     no change │
│ QQuery 17    │  2620.13 ms │  2634.50 ms │     no change │
│ QQuery 18    │  5216.94 ms │  4964.91 ms │     no change │
│ QQuery 19    │   124.64 ms │   123.02 ms │     no change │
│ QQuery 20    │  1918.23 ms │  1925.17 ms │     no change │
│ QQuery 21    │  2227.98 ms │  2213.04 ms │     no change │
│ QQuery 22    │  3820.04 ms │  3811.79 ms │     no change │
│ QQuery 23    │ 12877.19 ms │ 12654.96 ms │     no change │
│ QQuery 24    │   220.67 ms │   216.55 ms │     no change │
│ QQuery 25    │   485.06 ms │   484.94 ms │     no change │
│ QQuery 26    │   228.37 ms │   227.93 ms │     no change │
│ QQuery 27    │  2788.67 ms │  2762.40 ms │     no change │
│ QQuery 28    │ 23625.38 ms │ 23351.35 ms │     no change │
│ QQuery 29    │   969.25 ms │   940.18 ms │     no change │
│ QQuery 30    │  1319.82 ms │  1324.33 ms │     no change │
│ QQuery 31    │  1392.71 ms │  1336.34 ms │     no change │
│ QQuery 32    │  5596.43 ms │  4659.76 ms │ +1.20x faster │
│ QQuery 33    │  6275.42 ms │  5799.97 ms │ +1.08x faster │
│ QQuery 34    │  6090.08 ms │  5911.28 ms │     no change │
│ QQuery 35    │  1911.77 ms │  1891.01 ms │     no change │
│ QQuery 36    │   116.63 ms │   122.98 ms │  1.05x slower │
│ QQuery 37    │    56.94 ms │    58.76 ms │     no change │
│ QQuery 38    │   120.21 ms │   117.10 ms │     no change │
│ QQuery 39    │   193.14 ms │   195.20 ms │     no change │
│ QQuery 40    │    46.81 ms │    47.14 ms │     no change │
│ QQuery 41    │    42.14 ms │    42.96 ms │     no change │
│ QQuery 42    │    34.63 ms │    36.02 ms │     no change │
└──────────────┴─────────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)         │ 95838.70ms │
│ Total Time (specialize)   │ 93522.32ms │
│ Average Time (HEAD)       │  2228.81ms │
│ Average Time (specialize) │  2174.94ms │
│ Queries Faster            │          2 │
│ Queries Slower            │          2 │
│ Queries with No Change    │         39 │
│ Queries with Failure      │          0 │
└───────────────────────────┴────────────┘

alamb-ghbot · 2025-12-11T21:18:58Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing specialize (b6404e5) to e8a0829 diff using: tpch
Results will be posted here when complete

alamb-ghbot · 2025-12-11T21:19:38Z

🤖: Benchmark completed

Details

Comparing HEAD and specialize
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ specialize ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1     │ 224.07 ms │  221.82 ms │ no change │
│ QQuery 2     │  93.52 ms │   97.12 ms │ no change │
│ QQuery 3     │ 122.81 ms │  125.01 ms │ no change │
│ QQuery 4     │  76.70 ms │   76.89 ms │ no change │
│ QQuery 5     │ 172.98 ms │  174.68 ms │ no change │
│ QQuery 6     │  63.80 ms │   66.75 ms │ no change │
│ QQuery 7     │ 213.32 ms │  219.23 ms │ no change │
│ QQuery 8     │ 159.80 ms │  163.94 ms │ no change │
│ QQuery 9     │ 223.96 ms │  234.12 ms │ no change │
│ QQuery 10    │ 191.61 ms │  187.97 ms │ no change │
│ QQuery 11    │  72.75 ms │   73.41 ms │ no change │
│ QQuery 12    │ 120.40 ms │  119.68 ms │ no change │
│ QQuery 13    │ 219.24 ms │  211.24 ms │ no change │
│ QQuery 14    │  93.52 ms │   92.66 ms │ no change │
│ QQuery 15    │ 120.30 ms │  117.36 ms │ no change │
│ QQuery 16    │  56.34 ms │   59.14 ms │ no change │
│ QQuery 17    │ 297.79 ms │  306.88 ms │ no change │
│ QQuery 18    │ 319.40 ms │  318.16 ms │ no change │
│ QQuery 19    │ 136.82 ms │  135.55 ms │ no change │
│ QQuery 20    │ 124.00 ms │  126.51 ms │ no change │
│ QQuery 21    │ 259.82 ms │  261.47 ms │ no change │
│ QQuery 22    │  42.01 ms │   42.84 ms │ no change │
└──────────────┴───────────┴────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary         ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)         │ 3404.96ms │
│ Total Time (specialize)   │ 3432.44ms │
│ Average Time (HEAD)       │  154.77ms │
│ Average Time (specialize) │  156.02ms │
│ Queries Faster            │         0 │
│ Queries Slower            │         0 │
│ Queries with No Change    │        22 │
│ Queries with Failure      │         0 │
└───────────────────────────┴───────────┘

adriangb · 2025-12-11T21:42:49Z

Benchmarks look good to me!

alamb · 2025-12-11T22:00:17Z

alamb · 2025-12-11T22:01:27Z

datafusion/physical-expr/src/expressions/in_list.rs

+        DataType::UInt64 => Ok(Arc::new(UInt64StaticFilter::try_new(&in_array)?)),
+        // Float primitive types (use ordered wrappers for Hash/Eq)
+        DataType::Float32 => Ok(Arc::new(Float32StaticFilter::try_new(&in_array)?)),
+        DataType::Float64 => Ok(Arc::new(Float64StaticFilter::try_new(&in_array)?)),


Do we know if adding an optimized version of the binary / string type comparisons is tracked with a ticket?

No but I don't think that was optimized before and there's less opportunity to optimize because of the copy price. I'm going to leave that for #19241 where there's already a lot of great ideas if that's okay

adriangb requested a review from Copilot November 20, 2025 00:07

github-actions bot added the physical-expr Changes to the physical-expr crates label Nov 20, 2025

Copilot started reviewing on behalf of adriangb November 20, 2025 00:08 View session

Copilot finished reviewing on behalf of adriangb November 20, 2025 00:11

Copilot AI reviewed Nov 20, 2025

View reviewed changes

adriangb force-pushed the specialize branch from 5a3534a to 2feab11 Compare November 20, 2025 07:19

martin-g approved these changes Nov 20, 2025

View reviewed changes

alamb mentioned this pull request Nov 20, 2025

Andrew Lamb Weekly-ish Open Source plan - 2025-11-17 #18711

Closed

46 tasks

adriangb force-pushed the specialize branch from 0fe02e8 to ff302de Compare November 20, 2025 23:24

Dandandan reviewed Nov 21, 2025

View reviewed changes

This comment was marked as off-topic.

Sign in to view

alamb approved these changes Nov 22, 2025

View reviewed changes

Dandandan reviewed Nov 23, 2025

View reviewed changes

Dandandan approved these changes Nov 23, 2025

View reviewed changes

alamb mentioned this pull request Nov 23, 2025

Andrew Lamb Weekly-ish Open Source plan - 2025-11-24 #18888

Closed

37 tasks

alamb mentioned this pull request Dec 1, 2025

Andrew Lamb Weekly-ish Open Source plan - 2025-12-01 #19016

Closed

42 tasks

adriangb mentioned this pull request Dec 2, 2025

Add additional tests for InListExpr #19050

Merged

geoffreyclaude reviewed Dec 8, 2025

View reviewed changes

add float implementations

2f2c60d

geoffreyclaude reviewed Dec 10, 2025

View reviewed changes

geoffreyclaude reviewed Dec 11, 2025

View reviewed changes

adriangb added 2 commits December 11, 2025 13:07

Add truth table

0b5a9c9

use to_bits()

b6404e5

geoffreyclaude approved these changes Dec 11, 2025

View reviewed changes

adriangb added this pull request to the merge queue Dec 11, 2025

Merged via the queue into apache:main with commit 85d8a88 Dec 11, 2025
31 checks passed

alamb reviewed Dec 11, 2025

View reviewed changes

alamb added the performance Make DataFusion faster label Dec 11, 2025

add specialized InList implementations for common scalar types #18832

add specialized InList implementations for common scalar types #18832

Conversation

adriangb commented Nov 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

adriangb commented Nov 20, 2025

Uh oh!

adriangb commented Nov 20, 2025

Uh oh!

adriangb commented Nov 20, 2025

Uh oh!

martin-g left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dandandan Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

This comment was marked as off-topic.

alamb commented Nov 22, 2025

Uh oh!

alamb commented Nov 22, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dandandan left a comment

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adriangb commented Nov 23, 2025

Uh oh!

alamb commented Dec 2, 2025

Uh oh!

adriangb commented Dec 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geoffreyclaude commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb-ghbot commented Dec 10, 2025

Uh oh!

alamb-ghbot commented Dec 10, 2025

Uh oh!

adriangb commented Dec 10, 2025

Uh oh!

Dandandan Nov 21, 2025 •

edited

Loading

Dandandan commented Nov 23, 2025 •

edited

Loading

geoffreyclaude commented Dec 8, 2025 •

edited

Loading

geoffreyclaude commented Dec 11, 2025 •

edited

Loading