Skip to content

[BUG] [Databricks IT] SparkContext init fails with Py4J 'Answer from Java side is empty' on all xdist workers in Azure #14709

@pxLi

Description

@pxLi

Describe the bug
Build: rapids-it-azure-databricks-13.3/302, rapids-it-azure-databricks-14.3/277

The Databricks integration test run failed before any test executed. All 5 pytest-xdist workers failed to create a JavaSparkContext via py4j during pytest_sessionstart. The Databricks driver JVM returned an empty answer, causing Py4JError and forcing pytest to report INTERNALERROR and exit with code 3.

Error logs:

2026-04-30 07:06:30 INFO Error while receiving.
File "py4j/clientserver.py", line 541, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
...
INTERNALERROR>   File ".../spark_init_internal.py", line 151, in pytest_sessionstart
INTERNALERROR>     .appName('rapids spark plugin integration tests (python)').getOrCreate()
INTERNALERROR>   File "pyspark/context.py", line 442, in _initialize_context
INTERNALERROR>     return self._jvm.JavaSparkContext(jconf)
INTERNALERROR> py4j.protocol.Py4JError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext
[gw4] node down: Not properly terminated
replacing crashed worker gw4
INTERNALERROR>   File "xdist/dsession.py", line 267, in worker_errordown
INTERNALERROR>     self._active_nodes.remove(node)
INTERNALERROR> KeyError: <WorkerController gw4>
============================ no tests ran in 52.23s ============================
+ exit 3

We compared today's run of the 14.3 DB runtime commits, Azure and AWS runs share the same commit, but AWS works fine.

cat /databricks/BUILDINFO
  BUILD_SCM_BRANCH HEAD
  BUILD_SCM_REVISION c6338c5ab93bec28da36f4c6a25b9f1a4d381092
  BUILD_SCM_SHORT_HASH c6338c5
  BUILD_SCM_STATUS Clean
  BUILD_TIMESTAMP 1776489808
  DATEHASH 20260417222328-c6338c5ab93bec28da36f4c6a25b9f1a4d381092

Currently we have no idea if its an intermittent issue or azure made some breaking changes (or auto cleanup of temp paths)

Environment details

  • Databricks runtime: 13.3, 14.3, 17.3 runtimes on Azure
  • Python 3.10, py4j 0.10.9.7, pytest-xdist 3.8.0
  • Plugin: rapids-4-spark_2.12-26.06.0-SNAPSHOT-cuda12
  • TEST_PARALLEL=5

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingwontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions