Add Variant type support for semi-structured JSON columns#13
Add Variant type support for semi-structured JSON columns#13tonyalaribe wants to merge 1 commit intomasterfrom
Conversation
- Add parquet-variant dependencies (parquet-variant, parquet-variant-compute, parquet-variant-json) for proper Variant binary encoding per Parquet spec - Convert JSON columns (body, context, events, links, attributes, resource, errors) from Utf8 to Variant type in schema - Create variant_utils.rs with: - JSON to Variant conversion using parquet-variant-compute - Variant to JSON conversion for query results - Variant-aware wrapper UDFs (json_get, json_get_str, json_length, json_contains) that transparently handle both Variant and UTF8 inputs - Update schema_loader.rs: - Add variant_arrow_type() using BinaryView fields - Add variant_delta_type() using delta-kernel's unshredded_variant() - Add has_variant_columns() helper method - Update test_utils.rs to convert JSON columns to Variant on insert - Prepare Protocol with variantType feature (ready for when delta-rs adds support) Note: delta-rs ProtocolChecker doesn't yet support variantType feature, so Variant data is stored as Struct<metadata: BinaryView, value: BinaryView> without the protocol marker. The binary representation is correct per Parquet Variant spec.
Pull Request Review: Add Variant type support for semi-structured JSON columnsSummaryThis PR adds proper Parquet Variant type support for semi-structured JSON columns, which is a significant improvement for handling JSON data efficiently. The implementation is well-structured and follows best practices overall. ✅ StrengthsCode Quality
Architecture
🔍 Issues & Recommendations1. Potential Performance Concern - Cloning in
|
Summary
parquet-variantcratesbody,context,events,links,attributes,resource,errors) fromUtf8toVarianttypeChanges
New Dependencies
parquet-variantv0.2.0 - Core Variant typeparquet-variant-computev0.2.0 - JSON to Variant conversionparquet-variant-jsonv0.2.0 - Variant to JSON conversionNew Module:
variant_utils.rsjson_to_variant_array()- Convert JSON strings to Variant binary formatvariant_to_json_array()- Convert Variant back to JSON for queriesjson_get,json_get_str,json_length,json_contains) that handle both Variant and UTF8 inputsSchema Changes
schemas/otel_logs_and_spans.yaml- Changed 7 columns fromUtf8toVariantschema_loader.rs- Addedvariant_arrow_type()andvariant_delta_type()functionsProtocol Support (Prepared)
create_variant_protocol()function ready for when delta-rs adds variantType supportProtocolCheckerdoesn't includevariantTypein supported featuresTechnical Notes
Struct<metadata: BinaryView, value: BinaryView>per Parquet Variant specTest plan