Skip to content

Conversation

liamzwbao
Copy link
Contributor

@liamzwbao liamzwbao commented Aug 28, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Implement ListView/LargeListView for cast_to_variant

Are these changes tested?

Yes

Are there any user-facing changes?

New cast type supported

@scovich
Copy link
Contributor

scovich commented Sep 10, 2025

Looking at both this PR and the next one #8282, I think we can actually cover all five list types with just one list builder if we create a new extension trait:

trait ListLikeArray: Array {
    /// Get the values array
    fn values(&self) -> &arrow::array::ArrayRef;

    /// Get the start and end indices for a list element
    fn element_range(&self, index: usize) -> Range<usize>;
}

And then the builder takes a template type L: ListLikeArray or similar?

(Probably something similar could be done for the binary and string array variations as well, but I didn't look as closely there)

@liamzwbao
Copy link
Contributor Author

Thanks for the advice, @scovich ! Let me refactor this PR

@liamzwbao liamzwbao force-pushed the issue-8236-variant-list-view branch from 782ad12 to 0ef8abb Compare September 23, 2025 01:37
@liamzwbao liamzwbao marked this pull request as ready for review September 23, 2025 01:38
@liamzwbao liamzwbao force-pushed the issue-8236-variant-list-view branch from 0ef8abb to 65a2b30 Compare September 23, 2025 03:25
Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Any reason not to cover FixedSizeListArray while we're at it? It should be just a few extra lines of code and a new unit test.

@scovich
Copy link
Contributor

scovich commented Sep 23, 2025

Also, you may want to update the PR description now that the PR is no longer stacked?

@liamzwbao
Copy link
Contributor Author

Any reason not to cover FixedSizeListArray while we're at it? It should be just a few extra lines of code and a new unit test.

Yes, it’s pretty straightforward, but I’d prefer to keep it separate since I originally opened two distinct issues. That way, each change can close its corresponding ticket cleanly.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @liamzwbao -- I think there is just one more test needed and this one will be good to go

Thank you @scovich for the review

// Create a ListViewArray with some data
let mut builder = ListViewBuilder::new(Int32Array::builder(0));
builder.append_value(&Int32Array::from(vec![Some(0), Some(1), Some(2)]));
builder.append_value(&Int32Array::from(vec![Some(3), Some(4)]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add please add a test case that has a null as one of the list elements (in addition to entire slot that is null)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, variant array elements are non-nullable. So any (SQL) NULL must be converted to Variant::Null before adding it to the array -- this is true regardless of shredding.

Note: It almost certainly makes sense to convert NULL values from an arrow array with nullable elements into Variant::Null values when converting to variant. So the resulting code might be the same either way. But it might still be helpful to understand the distinction (code comments, unstated assumptions, etc)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that converting an arrow ListArray with null elements should result in an Variant::List(...) where one of the elements is Variant::Null

I am not sure if you asking a question or just making a statement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly a statement -- to make sure we're doing the right thing for the right reason rather than by accident.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked some variant spec folks a while back, and in their (parquet storage) world, arrays with SQL NULL elements are simply not a thing, and so variant code should not even need to think about it. For example:

it doesn't really make sense to talk about an array with (SQL) nullable elements. The array must already be a Variant array, and the Variant binary spec has no way to represent SQL NULL as an array element, only Variant null.

Meanwhile, IMO it's perfectly fine for arrow to take a position that NULL values in an arrow list array will produce Variant::NULL values if converted to variant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added cases where array element is NULL

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @scovich and @liamzwbao

@alamb alamb merged commit a8ad90d into apache:main Sep 24, 2025
17 checks passed
@liamzwbao liamzwbao deleted the issue-8236-variant-list-view branch September 24, 2025 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet-variant parquet-variant* crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Variant]: Implement DataType::ListView/LargeListView support for cast_to_variant kernel
3 participants