[spark] initiate fluss-spark and introduce spark catalog and table #2219

YannByron · 2025-12-21T12:42:56Z

Purpose

to introduce spark engine for #155

This is the first pr, which includes:

to introduce the basic spark architecture, fluss-spark-common, fluss-spark-ut and fluss-spark-3.x included.
to support spark catalog and table, based on spark 3.5 and 3.4 for now.
to support spark CI.

Linked issue: close #228

Brief change log

Tests

API and Format

Documentation

YannByron · 2025-12-22T04:42:24Z

@wuchong please take a look, thanks.

wuchong

Thanks @YannByron for the great work! I’ve left a few comments for consideration.

Additionally, it would be great if we could add Javadoc or explanatory comments to the key classes and methods, this would greatly improve readability and maintainability for future contributors.

wuchong · 2025-12-24T07:09:15Z

fluss-spark/pom.xml

+                        <reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
+                        <junitxml>.</junitxml>
+                        <argLine>-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=128m ${extraJavaTestArgs} -Dio.netty.tryReflectionSetAccessible=true</argLine>
+                        <filereports>PaimonTestSuite.txt</filereports>


FlussTestSuite.txt?

wuchong · 2025-12-24T07:18:05Z

fluss-spark/fluss-spark-common/pom.xml

+        <version>0.9-SNAPSHOT</version>
+    </parent>
+
+    <artifactId>fluss-spark-common</artifactId>


Should we add a Scala version suffix to the artifact ID (also for fluss-spark-3.4 and fluss-spark-3.5 modules)? This would ensure that the published JARs automatically include the Scala version in their artifact names during Maven deployment, following standard Scala cross-build conventions.

wuchong · 2025-12-24T07:59:26Z

pom.xml

+        <profile>
+            <id>spark3</id>
+            <modules>
+                <module>fluss-spark/fluss-spark-3.5</module>
+                <module>fluss-spark/fluss-spark-3.4</module>
+            </modules>
+            <activation>
+                <activeByDefault>true</activeByDefault>
+                <property>
+                    <name>spark3</name>
+                </property>
+            </activation>
+        </profile>


I think we can enable these modues by default? So that the license checker pipeline can verify these modules as well.

wuchong · 2025-12-24T08:05:34Z

fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/FlussCatalog.scala

+
+import scala.collection.JavaConverters._
+
+class FlussCatalog extends TableCatalog with SupportsFlussNamespaces with WithFlussAdmin {


How about naming it SparkCatalog? Since these catalog implementations reside in the Fluss repository alongside those for other engines (such as Flink and Trino), including the engine name in the class name would make it easier to identify and distinguish between them.

wuchong · 2025-12-24T08:05:47Z

fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/FlussTable.scala

+import org.apache.fluss.metadata.TableInfo
+import org.apache.fluss.spark.catalog.{FlussTableInfo, SupportsFlussPartitionManagement}
+
+case class FlussTable(table: TableInfo)


ditto. SparkTable?

wuchong · 2025-12-24T08:32:00Z

...-spark-common/src/main/scala/org/apache/fluss/spark/types/FlussDataTypeToSparkDataType.scala

+    SparkDataTypes.createMapType(
+      mapType.getKeyType.accept(this),
+      mapType.getValueType.accept(this),
+      mapType.isNullable


mapType.getValueType.isNullable

wuchong · 2025-12-24T08:32:51Z

...-spark-common/src/main/scala/org/apache/fluss/spark/types/FlussDataTypeToSparkDataType.scala

+
+import scala.collection.JavaConverters._
+
+object FlussDataTypeToSparkDataType extends DataTypeVisitor[SparkDataType] {


FlussDataTypeToSparkDataType -> FlussToSparkTypeVisitor to align with SparkToFlussTypeVisitor.

wuchong · 2025-12-24T08:40:14Z

fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/SparkConversions.scala

+
+    val (tableProps, customProps) =
+      caseInsensitiveProps.filterNot(SPARK_TABLE_OPTIONS.contains).partition {
+        case (key, _) => FlussConfigUtils.TABLE_OPTIONS.containsKey(key)


FlussConfigUtils.TABLE_OPTIONS is a static set, however, the fluss table created by newer client version may carry additional table options. Therefore, it would be more robust to check whether the config key start with table. prefix.

wuchong · 2025-12-24T08:46:19Z

.scalafmt.conf

+    ["org.apache.paimon\\..*"],
+    ["org.apache.paimon.shade\\..*"],


change to fluss package

wuchong · 2025-12-24T08:55:59Z

fluss-spark/fluss-spark-ut/src/test/scala/org/apache/fluss/spark/FlussCatalogTest.scala

+        .field("pt", DataTypes.STRING())
+        .build())
+    assertThat(testPartitionedTable.getPartitionKeys.get(0)).isEqualTo("pt")
+    assertThat(testPartitionedTable.getCustomProperties.containsKey("key")).isEqualTo(true)


What is the key to verify?

YannByron added 7 commits December 21, 2025 20:32

[spark] initiate fluss-spark and introduce spark catalog and table

9e39cb3

update

f7658b2

update

9d2424a

update

688f0cd

update

ca14a09

update

4901465

update

c37da01

wuchong reviewed Dec 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[spark] initiate fluss-spark and introduce spark catalog and table #2219

[spark] initiate fluss-spark and introduce spark catalog and table #2219

YannByron commented Dec 21, 2025

Uh oh!

YannByron commented Dec 22, 2025

Uh oh!

wuchong left a comment

Uh oh!

wuchong Dec 24, 2025

Uh oh!

wuchong Dec 24, 2025

Uh oh!

wuchong Dec 24, 2025

Uh oh!

wuchong Dec 24, 2025

Uh oh!

wuchong Dec 24, 2025

Uh oh!

wuchong Dec 24, 2025

Uh oh!

wuchong Dec 24, 2025

Uh oh!

wuchong Dec 24, 2025

Uh oh!

wuchong Dec 24, 2025

Uh oh!

wuchong Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		import scala.collection.JavaConverters._

		class FlussCatalog extends TableCatalog with SupportsFlussNamespaces with WithFlussAdmin {


		import scala.collection.JavaConverters._

		object FlussDataTypeToSparkDataType extends DataTypeVisitor[SparkDataType] {

		["org.apache.paimon\\..*"],
		["org.apache.paimon.shade\\..*"],

[spark] initiate fluss-spark and introduce spark catalog and table #2219

Are you sure you want to change the base?

[spark] initiate fluss-spark and introduce spark catalog and table #2219

Conversation

YannByron commented Dec 21, 2025

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

YannByron commented Dec 22, 2025

Uh oh!

wuchong left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants