Add native Apache Iceberg table support with CoralCatalog abstraction #556

aastha25 · 2025-10-24T21:24:43Z

What changes are proposed in this pull request, and why are they necessary?

Summary
This PR introduces native Apache Iceberg table support to Coral, enabling direct schema conversion from Iceberg to Calcite's RelDataType without lossy intermediate conversions through Hive's type system. The implementation preserves Iceberg-specific type semantics including timestamp precision and explicit nullability.
Key architectural decision: HiveMetastoreClient remains unchanged and
does NOT extend CoralCatalog. Integration classes use composition
(storing both instances) with runtime dispatch.

New Components

Catalog Abstraction (coral-common/src/main/java/com/linkedin/coral/common/catalog/)
CoralCatalog: Format-agnostic catalog interface with getTable(), getAllTables(), namespaceExists()
CoralTable: Unified table metadata interface (name(), properties(), tableType())
HiveCoralTable / IcebergCoralTable: Implementations wrapping native Hive/Iceberg table objects
TableType`: Simple enum (TABLE or VIEW)

Iceberg Integration

IcebergTable: Calcite ScannableTable implementation for Iceberg tables
IcebergTypeConverter: Converts Iceberg Schema → Calcite RelDataType with precision preservation
IcebergHiveTableConverter: Backward compatibility bridge for UDF resolution (converts Iceberg → Hive table object)

Integration Pattern

HiveSchema, HiveDbSchema, ToRelConverter: Store both CoralCatalog and HiveMetastoreClient instances
Runtime dispatch: if coralCatalog != null use unified path;
else if msc != null use Hive-only path
HiveMetastoreClient and HiveMscAdapter marked @deprecated (still functional, prefer CoralCatalog)

How Reviewers Should Read This
Start here:

CoralCatalog.java - New abstraction layer interface
CoralTable.java - Unified table metadata interface
IcebergCoralTable.java - How Iceberg tables are wrapped
IcebergTypeConverter.java - Core schema conversion logic

Then review integration:

HiveDbSchema.java - Dispatch logic based on CoralTable type (Iceberg vs Hive)
IcebergTable.java - Calcite integration
ToRelConverter.java - Dual-path support (CoralCatalog vs HiveMetastoreClient)
HiveMetastoreClient.java - Backward compatibility

Test:

IcebergTableConverterTest.java - End-to-end Iceberg conversion test

How was this patch tested?

New and existing tests pass
integration tests - WIP

sumedhsakdeo

Thanks @aastha25 , code looks great. Added some questions / comments, ptal.

sumedhsakdeo · 2025-10-27T04:11:30Z

coral-common/src/main/java/com/linkedin/coral/common/IcebergTypeConverter.java

+        // Iceberg timestamp type - microsecond precision (6 digits)
+        convertedType = typeFactory.createSqlType(SqlTypeName.TIMESTAMP, 6);


Can we handle timestamp with time zone for completeness?

Suggested change

// Iceberg timestamp type - microsecond precision (6 digits)

convertedType = typeFactory.createSqlType(SqlTypeName.TIMESTAMP, 6);

Types.TimestampType timestampType = (Types.TimestampType) icebergType;

if (timestampType.shouldAdjustToUTC()) {

// TIMESTAMP WITH TIME ZONE - stores instant in time

convertedType = typeFactory.createSqlType(SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE, 6);

} else {

// TIMESTAMP - stores local datetime

convertedType = typeFactory.createSqlType(SqlTypeName.TIMESTAMP, 6);

}

Ref: https://github.com/apache/calcite/blob/95350ed1a449bbb2f008fcf2b704544e7d95c410/core/src/main/java/org/apache/calcite/sql/type/SqlTypeName.java#L73

sumedhsakdeo · 2025-10-27T04:12:14Z

coral-common/src/main/java/com/linkedin/coral/common/IcebergTypeConverter.java

+        convertedType = typeFactory.createSqlType(SqlTypeName.DATE);
+        break;
+      case TIME:
+        convertedType = typeFactory.createSqlType(SqlTypeName.TIME);


Suggested change

convertedType = typeFactory.createSqlType(SqlTypeName.TIME);

convertedType = typeFactory.createSqlType(SqlTypeName.TIME, 6);

sumedhsakdeo · 2025-10-27T04:17:58Z

coral-common/src/main/java/com/linkedin/coral/common/IcebergTypeConverter.java

+        convertedType = typeFactory.createSqlType(SqlTypeName.BINARY, fixedType.length());
+        break;
+      case BINARY:
+        convertedType = typeFactory.createSqlType(SqlTypeName.VARBINARY, Integer.MAX_VALUE);


Any particular reason why we use VARBINARY over BINARY here?
Unlike HiveTypeConverter

coral/coral-common/src/main/java/com/linkedin/coral/common/TypeConverter.java

Line 86 in 9f8dfce

convertedType = dtFactory.createSqlType(SqlTypeName.BINARY);

you're right, im changing this to be same as hive.

coral-common/src/main/java/com/linkedin/coral/common/IcebergTypeConverter.java

sumedhsakdeo · 2025-10-27T04:37:24Z

coral-common/src/main/java/com/linkedin/coral/common/IcebergTable.java

+
+  @Override
+  public Schema.TableType getJdbcTableType() {
+    return dataset.tableType() == TableType.VIEW ? Schema.TableType.VIEW : Schema.TableType.TABLE;


Suggested change

return dataset.tableType() == TableType.VIEW ? Schema.TableType.VIEW : Schema.TableType.TABLE;

return Schema.TableType.TABLE;

🤔, with an assert that dataset.tableType() should be TableType.MANAGED_TABLE?

sumedhsakdeo · 2025-10-27T04:39:51Z

coral-common/src/main/java/com/linkedin/coral/common/IcebergTable.java

+  /**
+   * Returns the underlying Iceberg Table for advanced operations.
+   * 
+   * @return Iceberg Table object


Suggested change

* @return Iceberg Table object

* @return org.apache.iceberg.Table

sumedhsakdeo · 2025-10-27T04:45:27Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/IcebergHiveTableConverter.java

+ * Utility class to convert Iceberg datasets to Hive Table objects for backward compatibility.
+ *
+ * <p>This converter creates complete Hive Table objects from Iceberg tables, including schema conversion
+ * using {@code HiveSchemaUtil}. While the table object acts as "glue code" for backward compatibility,
+ * it populates all standard Hive table metadata to ensure broad compatibility with downstream code paths.


Do we expect to exercise this glue code in practice? If so, under what scenarios?

This code path is used to read table properties when parsing the view SQL in @parsetreeBuilder.

Yes, this glue code gets exercised primarily to retrieve eligible table properties on the base tables during parsing
stage in ParseTreeBuilder/HiveFunctionResolver (no schema dependency). Without this glue, we would need larger scale refactoring in those classes to interpret IcebergTable natively.

sumedhsakdeo · 2025-10-27T04:48:51Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/IcebergHiveTableConverter.java

+
+    // Convert Iceberg schema to Hive columns
+    try {
+      storageDescriptor.setCols(HiveSchemaUtil.convert(icebergTable.schema()));


Any particular reason why we choose to set storageDescriptor columns from HiveSchemaUtil.convert(icebergTable.schema()) and not from AvroSchemaUtil.convert(hiveParameters.get("avro.schema.literal"))?

(a) iceberg schema is the SOT (b) avro literal may not always exist or could be stale & (c) this logic is ported from existing production code paths so as to have consistency in how we convert iceberg table -> hive table object across the stack.
Practically, setting this one way or the other in this specific class has no bearing on view schema resolution.

sumedhsakdeo · 2025-10-27T04:49:32Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/IcebergHiveTableConverter.java

+        0, // createTime
+        0, // lastModifiedTime
+        0, // retention


What are the side-effects of empty metadata here for owner .. retention?

none, we practically only need tbl properties for backward compatibility with SQL parser logic in ParseTreeBuilder

wmoustafa · 2025-10-27T06:13:11Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/CoralCatalog.java

+   * @param dbName Database or namespace name
+   * @return true if the namespace exists, false otherwise
+   */
+  boolean namespaceExists(String dbName);


Let us fix the inconsistencies between namespace, db, schema.

wmoustafa · 2025-10-27T06:14:02Z

Let us not use Dataset as it is not a standard term. Table maybe more appropriate, but I understand you want to use it elsewhere, but we can find an alternative. I think Schema is also closer to calcite terminolorgy than Namespace. As much as possible we should use standard terms, or when in ambiguity, we should be closer to Calcite terminology.

wmoustafa · 2025-10-27T06:14:15Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/CoralCatalog.java

+ * across different table formats (Hive, Iceberg, etc.).
+ *
+ * CoralCatalog abstracts away the differences between various table formats
+ * and provides a consistent way to access dataset information through


Ditto on "Dataset" terminology.

thanks for the feedback, I have refactored the PR to move away from Dataset.
we now have
(1) coralCatalog (new catalog interface) & HiveMetastoreClient (old catalog interface) are independent and both work with Coral for translations. HiveMetastoreClient has been marked as deprecated in favor of coralCatalog.
(2) getTable() is the API in coralCatalog. It returns an interface of CoralTable. Currently, we have 2 impls of coralTable - hiveCoralTable & icebergCoralTable

coral-common/src/main/java/com/linkedin/coral/common/catalog/DatasetConverter.java

wmoustafa · 2025-10-27T06:23:13Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/HiveDataset.java

+   */
+  @Override
+  public TableType tableType() {
+    return TableType.fromHiveTableType(table.getTableType());


Can you write the spec of conversion between Hive, Iceberg, and Coral representations? How does this expand for more table formats? Ideally we should have a Coral representation that is universal enough and everything can be converted to it. So I would expect methods like toCoralType as opposed to fromHiveType. Underlying table formats should not be hard coded in the universal catalog as well.

TableType.fromHiveTableType this method has been deleted.
Also, as discussed, the spec of table formats -> coral IR is just schema conversion which is captured in classes TypeConverter for hive tables & IcebergTypeConverter for iceberg tables.

wmoustafa · 2025-10-31T05:05:24Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/IcebergHiveTableConverter.java

+   * @param icebergCoralTable Iceberg coral table to convert
+   * @return Hive Table object with complete metadata and schema
+   */
+  public static Table toHiveTable(IcebergCoralTable icebergCoralTable) {


Based on offline discussion, I understood we got rid of those, but seems we still leverage methods that hardcode the table formats. This is not good for extensibility.

wmoustafa · 2025-10-31T05:14:56Z

Can we eliminate

if (coralCatalog != null) {
  ...
} else {
  ...
}

that we are currently using everywhere and use some adapter class instead?

wmoustafa · 2025-10-31T05:17:31Z

There are quite a fiew overlapping wrappers:

CoralTable --> HiveCoralTable / IcebergCoralTable
HiveTable / HiveViewTable
IcebergTable

The layering is conceptually unclear. Can we simplify this and merge a few classes?

wmoustafa · 2025-10-31T05:39:51Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/CoralCatalog.java

+   * @param namespaceName Namespace (database) name
+   * @return true if the namespace exists, false otherwise
+   */
+  boolean namespaceExists(String namespaceName);


Inconsistency between namespace, and schema elsewhere.

wmoustafa · 2025-10-31T06:21:37Z

I have considered a few design options and this seems to make the most sense:

interface CoralTable extends ScannableTable
class HiveTable implements CoralTable
class IcebergTable implements CoralTable

wmoustafa · 2025-11-01T09:00:07Z

coral-hive/src/test/java/com/linkedin/coral/common/IcebergTableConverterTest.java

+    assertEquals(timestampType.getPrecision(), 6, "Timestamp should have precision 6 (microseconds)");
+  }


I found here that the max precision of TIMESTAMP in Calcite is 3. I also wrote a similar test in my local and it failed for me. Curious how it worked here. Could you clarify which particular RelDataTypeFactory is being picked up in this test?

Ok. I checked out the PR and found it was HiveTypeSystem which has max precision of 9.

wmoustafa · 2025-11-01T09:57:57Z

I have considered a few design options and this seems to make the most sense:
interface CoralTable extends ScannableTable
class HiveTable implements CoralTable
class IcebergTable implements CoralTable

Discussed offline. The motivation of the above was to avoid duplicating implementation layers (i.e., having both HiveTable and HiveCoralTable, IcebergTable and IcebergCoralTable as in the current PR). The idea was to consolidate each table foramt's implementation into a single class that directly implements ScannableTable. However, this approach exposes RelDataType through the CoralTable API, which could make it harder to replace RelDataType in the future, especially CoralTable will be exposed to the engine connectors.

To properly decouple the two, we would need a standalone Coral type system that models schema and type metadata independently. That type system has now been introduced in #558, which can serve as the foundation for adopting an approach that makes CoralTable fully standalone and decoupled from ScannableTable.

wmoustafa · 2025-12-08T17:29:29Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/CoralCatalog.java

+   * @param tableName Table name
+   * @return CoralTable object representing the table, or null if not found
+   */
+  CoralTable getTable(String namespaceName, String tableName);


namespaceName -> namespace.

wmoustafa · 2025-12-08T17:30:44Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/CoralTable.java

+ * This interface provides a common way to access table metadata regardless
+ * of the underlying table format (Hive, Iceberg, etc.).
+ *
+ * This abstraction is used by Calcite integration layer to dispatch to


Let us not mention implementation details (e.g., Calcite) explicitly in Coral common module. Can you check the rest of the PR?

wmoustafa · 2025-12-08T17:33:30Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/CoralTable.java

+   *
+   * @return Fully qualified table name
+   */
+  String name();
+


Does the method name need to reflect it is a fully qualified table name? Also, is this method used anywhere? Let us add things only when necessary.

wmoustafa · 2025-12-08T17:37:16Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/HiveCoralTable.java

+ *
+ * Used by Calcite integration to dispatch to HiveTable.
+ */
+public class HiveCoralTable implements CoralTable {


There are table implementations in common and table implementations in catalog. What is the basis for defining in each?

wmoustafa · 2025-12-08T17:48:10Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/HiveCoralTable.java

+ *
+ * Used by Calcite integration to dispatch to HiveTable.
+ */
+public class HiveCoralTable implements CoralTable {


Does this need to be implemented as a plugin to avoid class path issues? Also note that both Hive and Iceberg co-exist in the same module which is not quite normal.

wmoustafa · 2025-12-08T17:49:36Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/HiveCoralTable.java

+        final CoralDataType coralType = HiveToCoralTypeConverter.convert(typeInfo);
+


Example of package inconsistency. Utility method for this implementation is in the common package.

wmoustafa · 2025-12-08T17:58:39Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/IcebergCoralTable.java

+  public org.apache.iceberg.Table getIcebergTable() {
+    return table;
+  }


Ideally you should not need this, and it might indicate a leak in the API. Can we avoid it?

wmoustafa · 2025-12-09T07:07:02Z

coral-common/build.gradle


  compile deps.'hadoop'.'hadoop-common'
+
+  // LinkedIn Iceberg dependencies


Another reason why this needs to be a plugin. This should integrate with OSS Iceberg too.

wmoustafa · 2025-12-09T07:10:02Z

coral-common/src/main/java/com/linkedin/coral/common/ToRelConverter.java

+      // Convert Iceberg coral table to minimal Hive Table for backward compatibility
+      // This is needed because downstream code (ParseTreeBuilder, HiveFunctionResolver)
+      // expects a Hive Table object for Dali UDF resolution


Can you elaborate more? Why not use CoralTable there?

wmoustafa · 2025-12-09T07:10:53Z

coral-common/src/main/java/com/linkedin/coral/common/catalog/IcebergHiveTableConverter.java

+ *   <li>Storage descriptor with SerDe info (for compatibility)</li>
+ * </ul>
+ */
+public class IcebergHiveTableConverter {


The point of this change is not to do this anymore.

wmoustafa · 2025-12-09T07:15:04Z

...-trino/src/main/java/com/linkedin/coral/trino/rel2trino/DataTypeDerivedSqlCallConverter.java

+    topSqlNode.accept(new RegisterDynamicFunctionsForTypeDerivation());
+
+    TypeDerivationUtil typeDerivationUtil = new TypeDerivationUtil(toRelConverter.getSqlValidator(), topSqlNode);
+    operatorTransformerList = SqlCallTransformers.of(new FromUtcTimestampOperatorTransformer(typeDerivationUtil),
+        new GenericProjectTransformer(typeDerivationUtil), new NamedStructToCastTransformer(typeDerivationUtil),
+        new ConcatOperatorTransformer(typeDerivationUtil), new SubstrOperatorTransformer(typeDerivationUtil),
+        new CastOperatorTransformer(typeDerivationUtil), new UnionSqlCallTransformer(typeDerivationUtil));


Refactor the code to avoid repitition?

wmoustafa · 2025-12-09T07:16:33Z

coral-common/src/main/java/com/linkedin/coral/common/IcebergCalciteTableAdapter.java

+   */
+  @Override
+  public RelDataType getRowType(RelDataTypeFactory typeFactory) {
+    // Stage 1: Iceberg → Coral


Nit: Step.

wmoustafa · 2025-12-09T07:25:21Z

coral-common/src/main/java/com/linkedin/coral/common/IcebergTypeConverter.java

+ *
+ * Copied structure from TypeConverter for consistency.
+ */
+public class IcebergTypeConverter {


I do not see why this is required. Could you explain?

wmoustafa · 2025-12-09T07:34:25Z

coral-common/src/main/java/com/linkedin/coral/common/HiveDbSchema.java

+   * @param msc Hive metastore client for Hive-specific access (can be null if coralCatalog is provided)
+   * @param dbName Database name (must not be null)
+   */
+  HiveDbSchema(CoralCatalog coralCatalog, HiveMetastoreClient msc, @Nonnull String dbName) {


Can we introduce the coral classes here, depending only on CoralCatalog, and mark the Hive ones as deprecated?

sumedhsakdeo requested changes Oct 27, 2025

View reviewed changes

wmoustafa reviewed Oct 27, 2025

View reviewed changes

wmoustafa requested changes Oct 27, 2025

View reviewed changes

sumedhsakdeo approved these changes Oct 30, 2025

View reviewed changes

wmoustafa reviewed Oct 31, 2025

View reviewed changes

wmoustafa reviewed Nov 1, 2025

View reviewed changes

wmoustafa mentioned this pull request Nov 5, 2025

Introduce Coral type system abstraction #558

Merged

aastha25 mentioned this pull request Nov 19, 2025

Integrate Coral Type System for Hive Tables #563

Merged

aastha25 added 9 commits December 2, 2025 15:38

initial commit for iceberg schema integration

bbb4332

preserve iceberg timestamp precision

fd6b6c0

fix the wiring for icebergDataset

f1cdbf2

backward compatibility of iceberg table for parsing only

5d84162

refactr IcebergTypeConverter

f54221f

decouple coralCatalog & hievmetastoreclient

9b072ea

minor version bump & mark old interfaces deprecated

27061e2

glue logic for CoralCatalog in coral-trino

9908631

rebase changes

f38d897

aastha25 force-pushed the icebergtype branch from bfb065a to f38d897 Compare December 2, 2025 23:44

aastha25 added 2 commits December 4, 2025 12:31

include IcebergToCoral type system & related changes

ee4d9de

enable viz tests

ec45fd0

wmoustafa requested changes Dec 8, 2025

View reviewed changes

wmoustafa reviewed Dec 8, 2025

View reviewed changes

wmoustafa reviewed Dec 9, 2025

View reviewed changes

aastha25 mentioned this pull request Jan 8, 2026

Refactor ParseTreeBuilder to Use CoralTable Instead of Hive Table #575

Open

rename classes & the integ

5ee910c

		// Iceberg timestamp type - microsecond precision (6 digits)
		convertedType = typeFactory.createSqlType(SqlTypeName.TIMESTAMP, 6);

-        // Iceberg timestamp type - microsecond precision (6 digits)
-        convertedType = typeFactory.createSqlType(SqlTypeName.TIMESTAMP, 6);
+        Types.TimestampType timestampType = (Types.TimestampType) icebergType;
+        if (timestampType.shouldAdjustToUTC()) {
+                // TIMESTAMP WITH TIME ZONE - stores instant in time
+               convertedType = typeFactory.createSqlType(SqlTypeName.TIMESTAMP_WITH_LOCAL_TIME_ZONE, 6);
+        } else {
+               // TIMESTAMP - stores local datetime
+               convertedType = typeFactory.createSqlType(SqlTypeName.TIMESTAMP, 6);
+        }

	return dataset.tableType() == TableType.VIEW ? Schema.TableType.VIEW : Schema.TableType.TABLE;
	return Schema.TableType.TABLE;

	* @return Iceberg Table object
	* @return org.apache.iceberg.Table

		assertEquals(timestampType.getPrecision(), 6, "Timestamp should have precision 6 (microseconds)");
		}

		final CoralDataType coralType = HiveToCoralTypeConverter.convert(typeInfo);


		compile deps.'hadoop'.'hadoop-common'

		// LinkedIn Iceberg dependencies

Add native Apache Iceberg table support with CoralCatalog abstraction #556

Are you sure you want to change the base?

Add native Apache Iceberg table support with CoralCatalog abstraction #556

Uh oh!

Conversation

aastha25 commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request, and why are they necessary?

How was this patch tested?

Uh oh!

sumedhsakdeo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wmoustafa commented Oct 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wmoustafa commented Oct 31, 2025

Uh oh!

wmoustafa commented Oct 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wmoustafa commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wmoustafa commented Nov 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aastha25 commented Oct 24, 2025 •

edited

Loading

wmoustafa commented Oct 31, 2025 •

edited

Loading