Skip to content

[Variant] Support variant_get kernel for shredded variants #7941

@alamb

Description

@alamb

I just realized something -- when dealing with shredded variants, this value method will do a lot of work to unshred and encode the whole thing (see e.g. #7915 (comment)). And that work is not memoized anywhere unless the caller is able to do so. For efficiency reasons we should strongly consider some kind of direct pathing support in VariantArray itself. Otherwise, it would be far too easy for a caller to accidentally do quadratic work when repeatedly calling value+get_path pairs for different paths.

Originally posted by @scovich in #7919 (comment)


@Samyak2 in #7919 (comment)

Just to understand this -- is this what you're suggesting?

  • We add a new get_path method in VariantArray that does this:
    • For each row, look up the type of the variant and perform the pathing without decoding. So, for example, if it's a VariantObject and there's a VariantPathElement::Field path, it would get the offset for the given field (not sure how yet) and advance the value slice by that much.
    • We would then create a new VariantArray with the metadata directly copied over and the value copied starting from the advanced slice.
    • For shredded variants, if the path ends up on a shredded value, what would be the expected behavior? I'm guessing that the shredded fields will be represented as an Array of the concrete type (an Int32Array for example) and not a VariantArray. Will we wrap these in a VariantArray and send it back? This is one case where having the path + cast in the same operation would help.
  • This variant_get would then simply re-use VariantArray::get_path and perform the appropriate cast.

@alamb in #7919 (comment)

For efficiency reasons we should strongly consider some kind of direct pathing support in VariantArray itself. Otherwise, it would be far too easy for a caller to accidentally do quadratic work when repeatedly calling value+get_path pairs for different paths.

I think @scovich is saying that the variant_get kernel (on VariantArray should have a special case that knows how to look for a shredded sub field -- and if for example it is asking for a and the the typed_value.a column exists, variant_get could simply return that a column (already as an arrow array, no actual Variant manipulation required)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions