Conversation
Code Coverage
|
Collaborator
|
Don't love "Provider" being overloaded here, but otherwise lgtm. |
Collaborator
Author
@ryannedolan Renamed AvroSchemaProvider -> AvroSchemaSource, let me know what you think |
ryannedolan
approved these changes
May 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
For the record I don't love this solution... but I'll explain why it seems necessary down below...
This change exposes the source-of-truth Avro schemas from Hoptimator tables, replacing lossy
RelDataTyperound-tripping for the consumers that care about Avro-level fidelity.Adds a new
AvroSchemaProviderinterface that CalciteTables can implement to surface their key and value Avro schemas directly. Three consumers are wired up to use it, each falling back to the existing synthesis behavior when a provider isn't available:K8sConnector— new{{avroValueSchema}}template variable for connector payload options (e.g. a Flink connector'sdefault.mode.payload). Renders the upstream's native value schema when present; synthesizes from the flat row type otherwise.HoptimatorConnection.resolve()/!resolveCLI — returns the merged view (value +KEY_-prefixed key fields) so SQL queries and the Flink catalog see keys as columns.HoptimatorJdbcTableimplements the interface by peeking through the upstream JDBCDataSourceto the underlying Calcite table.VeniceStoreis the first provider implementation:valueSchema()returns the raw value schema fromStoreSchemaFetcher,keySchema()returns the raw key schema.Why
AvroConverter.rel→DataTypeUtils.flatten→ JDBC →DataTypeUtils.unflatten→AvroConverter.avrostrips too much: namespaces get synthesized from field paths, nested record identities are invented per call site, reused record definitions get duplicated, field-level props/defaults/aliases are dropped. For consumers that want to hand the Avro schema to another Avro-aware system, none of that survives. Internally we run into schema incompatibility issues when dealing with Flink. Flink connectors and the Proteus Flink catalog need to be aware of the real table Avro schema. This change gives those consumers a direct path to the source schema while leaving the RelDataType round-trip in place for the SQL query layer that depends on the flattening.Now why I don't love this. I don't love this change strictly because we need to cross the JDBC boundary which is now providing a backdoor into the underlying drivers, so this begs the question, do we really need those drivers?
Changes
New —
hoptimator-avroAvroSchemaProviderinterface withvalueSchema()(the payload) andkeySchema()(the key, ornullwhen the table has no distinct key concept).AvroSchemasutility class with:KEY_PREFIX/PRIMITIVE_KEY_NAMEconstants (the Hoptimator-wide convention for merging keys into value schemas).cloneField(name, Schema.Field)— produces an unownedSchema.Fieldclone preserving name, type reference, doc, default, sort order, aliases, and custom properties (Avro rejects Fields already owned by another record via its internal position guard).mergeKeyIntoValue(keySchema, valueSchema, keyPrefix, primitiveKeyName)— produces a merged record inheriting the value schema's identity (namespace, name, doc, aliases, record-level props). Struct keys contribute prefixed fields; primitive keys contribute a single named field.mergedAvroSchemaFor(AvroSchemaProvider)— applies the Hoptimator convention (KEY_prefix /KEYfield). Used byresolve().Implementations
VeniceStoreimplementsAvroSchemaProvider.Consumers
K8sConnector: new{{avroValueSchema}}template variable. Prefers provider-supplied value schema; falls back toAvroConverter.avro(rowType).HoptimatorConnection.resolve(): usesAvroSchemas.mergedAvroSchemaForwhen a provider is available; existing synthesis otherwise.HoptimatorJdbcTable: implementsAvroSchemaProviderby unwrapping the upstream JDBCDataSourceto aCalciteConnection, walking to the real upstream table, and delegating.Behavior changes
{{avroValueSchema}}template variable: new. No existing templates referenced it.HoptimatorConnection.resolve()/!resolveCLI: when the resolved table implementsAvroSchemaProvider, the returned schema is the merged view from the source (with source-level namespaces, nested record identities, and properties preserved) rather than the synthesized one. Shape is the same (value fields +KEY_-prefixed key fields), only the fidelity improves. Falls back to existing behavior for non-providers.Test plan
./gradlew buildgreen locally (all 195 tasks).AvroSchemasTest(10 tests) — coverscloneField(position guard, doc/default/order/aliases/props preservation),mergeKeyIntoValue(struct key, primitive key, nested-record namespace preservation, reused-record single-definition serialization, record-level identity/props inheritance, parse round-trip, non-record-value error), andmergedAvroSchemaFor(no-key short-circuit, key-present merge).VeniceStoreTest— new tests forvalueSchema()returning the raw payload,keySchema()returning raw key (struct and primitive),valueSchemaIdpath, end-to-endmergedAvroSchemaForintegration.HoptimatorConnectionTest— 4 new tests forproviderSchemaAt(unknown path, non-provider, no-key provider, struct-key and primitive-key merged output).HoptimatorJdbcTableTest— delegation tests forvalueSchema/keySchema(null upstream, non-provider upstream, dual delegation).K8sConnectorTest— provider-path and fallback-path tests for{{avroValueSchema}}template rendering.