[WIP][SPARK-54314][PYTHON][CONNECT] Improve Server-Side debuggability in Spark Connect by capturing client application's file name and line numbers #53076

susheel-aroskar · 2025-11-15T00:45:11Z

What changes were proposed in this pull request?

Optionally transmitting client-side code location details (function name, file name and line number) along with actions.

Why are the changes needed?

Right now there is no information sent to Spark Connect server that will aid in pointing the location of the call (i.e. Spark data frame action) in the client application code. By making this change, client application call stack details are sent to the server as a list of (function name, file name, line number) tuples where they can be logged in the server logs, included in corresponding open telemetry spans as attributes etc. This will help users looking from server side UI or Console to quickly pinpoint call locations of erring or slow calls in their own (client application) code without server needing to have access to the actual code.

Does this PR introduce any user-facing change?

It includes a new ENV variable SPARK_CONNECT_DEBUG_CLIENT_CALL_STACK which user can set to true / 1 to opt into transmitting client application code locations to server. If opted into, the client app call stack trace details are included in the user_context.extensions field of the Spark Connect protobufs

How was this patch tested?

By adding new unit test test_client_call_stack_trace.py

Was this patch authored or co-authored using generative AI tooling?

Yes.
Some of the unit tests were Generated-by: Cursor

…t application's file name and line numbers in PySpark

holdenk

This looks neat. If you can rebase it and run the linter that would be awesome. Thanks for working to improve the debugging experience for PySpark connect users :)

holdenk · 2025-11-17T18:11:05Z

python/pyspark/sql/connect/client/core.py

+        if _is_pyspark_source(filename):
+            break


Maybe add a quick comment on why we're stopping as soon as we encounter a pyspark source file path, presumably this is because we don't need to bubble up to the user the specific client call we've got?

holdenk · 2025-11-17T18:13:14Z

python/pyspark/sql/connect/client/core.py

+    List[any_pb2.Any]: A list of Any objects, each representing a stack frame in the call stack trace in the user code.
+    """
+    call_stack_trace = []
+    if os.getenv("SPARK_CONNECT_DEBUG_CLIENT_CALL_STACK", "false").lower() in ("true", "1"):


Why system env variable instead of Spark configuration flag? Also if we're adding a new configuration option we should probably document it somewhere (if it's a spark conf flag we have the doc in-line sort of already)

HyukjinKwon · 2025-11-18T01:55:35Z

cc @zhengruifeng and @ueshin FYI

sfc-gh-saroskar and others added 3 commits November 13, 2025 17:31

Improve Server-Side debuggability in Spark Connect by capturing clien…

c43b9ea

…t application's file name and line numbers in PySpark

Add descriptive comment on test, link it to the Jira

ae4ab1e

Fix linter warnings

55b3c46

github-actions bot added SQL PYTHON CONNECT labels Nov 15, 2025

holdenk reviewed Nov 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP][SPARK-54314][PYTHON][CONNECT] Improve Server-Side debuggability in Spark Connect by capturing client application's file name and line numbers #53076

[WIP][SPARK-54314][PYTHON][CONNECT] Improve Server-Side debuggability in Spark Connect by capturing client application's file name and line numbers #53076

susheel-aroskar commented Nov 15, 2025

Uh oh!

holdenk left a comment

Uh oh!

holdenk Nov 17, 2025

Uh oh!

holdenk Nov 17, 2025

Uh oh!

HyukjinKwon commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[WIP][SPARK-54314][PYTHON][CONNECT] Improve Server-Side debuggability in Spark Connect by capturing client application's file name and line numbers #53076

Are you sure you want to change the base?

[WIP][SPARK-54314][PYTHON][CONNECT] Improve Server-Side debuggability in Spark Connect by capturing client application's file name and line numbers #53076

Conversation

susheel-aroskar commented Nov 15, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

holdenk left a comment

Choose a reason for hiding this comment

Uh oh!

holdenk Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

holdenk Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants