Describe the bug
Build: rapids-it-azure-databricks-13.3/302, rapids-it-azure-databricks-14.3/277
The Databricks integration test run failed before any test executed. All 5 pytest-xdist workers failed to create a JavaSparkContext via py4j during pytest_sessionstart. The Databricks driver JVM returned an empty answer, causing Py4JError and forcing pytest to report INTERNALERROR and exit with code 3.
Error logs:
2026-04-30 07:06:30 INFO Error while receiving.
File "py4j/clientserver.py", line 541, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
...
INTERNALERROR> File ".../spark_init_internal.py", line 151, in pytest_sessionstart
INTERNALERROR> .appName('rapids spark plugin integration tests (python)').getOrCreate()
INTERNALERROR> File "pyspark/context.py", line 442, in _initialize_context
INTERNALERROR> return self._jvm.JavaSparkContext(jconf)
INTERNALERROR> py4j.protocol.Py4JError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext
[gw4] node down: Not properly terminated
replacing crashed worker gw4
INTERNALERROR> File "xdist/dsession.py", line 267, in worker_errordown
INTERNALERROR> self._active_nodes.remove(node)
INTERNALERROR> KeyError: <WorkerController gw4>
============================ no tests ran in 52.23s ============================
+ exit 3
We compared today's run of the 14.3 DB runtime commits, Azure and AWS runs share the same commit, but AWS works fine.
cat /databricks/BUILDINFO
BUILD_SCM_BRANCH HEAD
BUILD_SCM_REVISION c6338c5ab93bec28da36f4c6a25b9f1a4d381092
BUILD_SCM_SHORT_HASH c6338c5
BUILD_SCM_STATUS Clean
BUILD_TIMESTAMP 1776489808
DATEHASH 20260417222328-c6338c5ab93bec28da36f4c6a25b9f1a4d381092
Currently we have no idea if its an intermittent issue or azure made some breaking changes (or auto cleanup of temp paths)
Environment details
- Databricks runtime: 13.3, 14.3, 17.3 runtimes on Azure
- Python 3.10, py4j 0.10.9.7, pytest-xdist 3.8.0
- Plugin: rapids-4-spark_2.12-26.06.0-SNAPSHOT-cuda12
- TEST_PARALLEL=5
Describe the bug
Build: rapids-it-azure-databricks-13.3/302, rapids-it-azure-databricks-14.3/277
The Databricks integration test run failed before any test executed. All 5 pytest-xdist workers failed to create a JavaSparkContext via py4j during pytest_sessionstart. The Databricks driver JVM returned an empty answer, causing Py4JError and forcing pytest to report INTERNALERROR and exit with code 3.
Error logs:
We compared today's run of the 14.3 DB runtime commits, Azure and AWS runs share the same commit, but AWS works fine.
Currently we have no idea if its an intermittent issue or azure made some breaking changes (or auto cleanup of temp paths)
Environment details