Skip to content

Releases: MTSWebServices/spark-dialect-extension

0.0.4 (2026-04-07)

07 Apr 13:27
7485078

Choose a tag to compare

Features

Added support for df.write.format("jdbc").option("truncate", "true")

Improvements

Added tests for Clickhouse JDBC 0.9.5+.

This JDBC driver version allows using Array(T) for almost all T, including Float32, Date, DateTime and Decimal, see ClickHouse/clickhouse-java#2627.
Except for UInt64 - there is an issue on Spark side.

Bug fixes

Convert UInt64 to Decimalype(38, 0) instead of DecimalType(20, 0) (Spark's default).

0.0.3

31 Oct 13:14
1948ca3

Choose a tag to compare

  • Added support for Clickhouse JDBC 0.9.x.

    This allows using Array(T) for numeric T, like Int16, Int32, Int64, Float64.

    But Date, DateTime and Decimal are not supported, see issue.

  • Wrap with Nullable(T) Spark DataFrame columns with nullable = true.

    Caveat - Spark DataFrames created from ORC and Parquet files have all columns with nullable = true.
    Using:

    df.write.format("jdbc").option("createTableOptions", "ENGINE = ReplacingMergeTree() ORDER BY (col1)")

    will fail if col1 is nullable. Workaround:

    import pyspark.sql.functions as F
    
    # make column non-nullable with coalesce
    # F.lit(...) should contain value compatible with `col1` type
    df = df.withColumn("a", F.coalesce("a", F.lit(0)))

0.0.2

02 Oct 13:30
4136bc1

Choose a tag to compare

  • Allow writing ArrayType(TimestampType()) Spark column as Clickhouse's Array(DateTime64(6)).
  • Allow writing ArrayType(ShortType()) Spark column as Clickhouse's Array(Int16).

0.0.1

01 Oct 09:46
d8a6f91

Choose a tag to compare

First release! 🎉

This version includes custom Clickhouse dialect for Apache Spark 3.5.x, with following enhancements:

  • support for writing Spark's ArrayType to Clickhouse. Currently only few types are supported, like ArrayType(StringType), ArrayType(ByteType), ArrayType(LongType), ArrayType(FloatType). Unfortunately, reading Arrays from Clickhouse to Spark is not fully supported for now.
  • fixed issue when writing Spark's TimestampType lead to creating Clickhouse table with DateTime64(0) instead of DateTime64(6), resulting a precision loss (fractions of seconds were dropped).
  • fixed issue when writing Spark's BooleanType lead to creating Clickhouse table with UInt64 column instead of Bool.