Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Nov 14, 2025

What changes were proposed in this pull request?

The SparkGetColumnsOperation is mainly used for the JDBC driver, while JDBC uses 1-based ordinal/column-index instead of 0-based.

This is also documented in Hive API.

.addPrimitiveColumn("ORDINAL_POSITION", Type.INT_TYPE,
"Index of column in table (starting at 1)")

Note, the GetColumnsOperation, which is originally copied from the Hive has a correct implementation, the issue only exists in SparkGetColumnsOperation.

For safety, a config spark.sql.legacy.hive.thriftServer.useZeroBasedColumnOrdinalPosition is added to allow the user to switch back to the previous behavior.

Why are the changes needed?

The SparkGetColumnsOperation is mainly used by JDBC java.sql.DatabaseMetaData#getColumns, this change makes it satisfy the JDBC API specification.

Does this PR introduce any user-facing change?

Yes, see the above section.

How was this patch tested?

UTs are modified.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Nov 14, 2025
@pan3793 pan3793 changed the title [SPARK-54350][STS] SparkGetColumnsOperation ORDINAL_POSITION should be 1-based [SPARK-54350][SQL][STS] SparkGetColumnsOperation ORDINAL_POSITION should be 1-based Nov 14, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null, // SQL_DATETIME_SUB
null, // CHAR_OCTET_LENGTH
pos.asInstanceOf[AnyRef], // ORDINAL_POSITION
(pos + 1).asInstanceOf[AnyRef], // ORDINAL_POSITION, 1-based
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this looks like a legit fix to follow the standard, do you think we need to document this as a breaking change or provide a legacy conf, @pan3793 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for @dongjoon-hyun 's suggestion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late reply. I think it should be low risk, AFAIK, this is mostly used by tools (RDBMS management GUI tools or BI tools) to sort the column, and the implementation is relatively lenient, but for safety, I can add a legacy config and fix this only for 4.1.

null, // SQL_DATETIME_SUB
null, // CHAR_OCTET_LENGTH
pos.asInstanceOf[AnyRef], // ORDINAL_POSITION
(pos + 1).asInstanceOf[AnyRef], // ORDINAL_POSITION, 1-based
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this PR is breaking the existing BI tool integration.

@github-actions github-actions bot added the DOCS label Nov 18, 2025
@pan3793
Copy link
Member Author

pan3793 commented Nov 18, 2025

@dongjoon-hyun @LuciferYang I added a config spark.sql.legacy.hive.thriftServer.useZeroBasedColumnOrdinalPosition to allow the user to switch back to the previous behavior, please take another look, thank you in advance.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @pan3793 and @LuciferYang .

Merged to master/4.1 for Apache Spark 4.1.0.

dongjoon-hyun pushed a commit that referenced this pull request Nov 18, 2025
…uld be 1-based

### What changes were proposed in this pull request?

The SparkGetColumnsOperation is mainly used for the JDBC driver, while JDBC uses 1-based ordinal/column-index instead of 0-based.

This is also documented in Hive API.

https://github.com/apache/spark/blob/551b922a53acfdfeb2c065d5dedf35cb8cd30e1d/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java#L94-L95

Note, the GetColumnsOperation, which is originally copied from the Hive has a correct implementation, the issue only exists in SparkGetColumnsOperation.

For safety, a config `spark.sql.legacy.hive.thriftServer.useZeroBasedColumnOrdinalPosition` is added to allow the user to switch back to the previous behavior.

### Why are the changes needed?

The SparkGetColumnsOperation is mainly used by JDBC [java.sql.DatabaseMetaData#getColumns](https://docs.oracle.com/en/java/javase/17/docs/api/java.sql/java/sql/DatabaseMetaData.html#getColumns(java.lang.String,java.lang.String,java.lang.String,java.lang.String)), this change makes it satisfy the JDBC API specification.

### Does this PR introduce _any_ user-facing change?

Yes, see the above section.

### How was this patch tested?

UTs are modified.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #53062 from pan3793/SPARK-54350.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 05bc5d4)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants