-
Notifications
You must be signed in to change notification settings - Fork 976
Description
I just realized something -- when dealing with shredded variants, this value
method will do a lot of work to unshred and encode the whole thing (see e.g. #7915 (comment)). And that work is not memoized anywhere unless the caller is able to do so. For efficiency reasons we should strongly consider some kind of direct pathing support in VariantArray
itself. Otherwise, it would be far too easy for a caller to accidentally do quadratic work when repeatedly calling value+get_path pairs for different paths.
Originally posted by @scovich in #7919 (comment)
Just to understand this -- is this what you're suggesting?
- We add a new
get_path
method inVariantArray
that does this:- For each row, look up the type of the variant and perform the pathing without decoding. So, for example, if it's a
VariantObject
and there's aVariantPathElement::Field
path, it would get the offset for the given field (not sure how yet) and advance thevalue
slice by that much. - We would then create a new
VariantArray
with the metadata directly copied over and the value copied starting from the advanced slice. - For shredded variants, if the path ends up on a shredded value, what would be the expected behavior? I'm guessing that the shredded fields will be represented as an Array of the concrete type (an
Int32Array
for example) and not aVariantArray
. Will we wrap these in aVariantArray
and send it back? This is one case where having the path + cast in the same operation would help.
- For each row, look up the type of the variant and perform the pathing without decoding. So, for example, if it's a
- This
variant_get
would then simply re-useVariantArray::get_path
and perform the appropriate cast.
For efficiency reasons we should strongly consider some kind of direct pathing support in VariantArray itself. Otherwise, it would be far too easy for a caller to accidentally do quadratic work when repeatedly calling value+get_path pairs for different paths.
I think @scovich is saying that the variant_get kernel (on VariantArray
should have a special case that knows how to look for a shredded sub field -- and if for example it is asking for a
and the the typed_value.a
column exists, variant_get could simply return that a
column (already as an arrow array, no actual Variant manipulation required)
- I filed [Variant] Support
variant_get
kernel for shredded variants #7941 to track this