[SPARK-54354][SQL] Fix Spark hanging when there's not enough JVM heap memory for broadcast hashed relation #53065

zhztheplayer · 2025-11-14T14:16:41Z

What changes were proposed in this pull request?

A fix to let Spark throw OOM rather than hang when there's not enough JVM heap memory for broadcast hashed relation. The fix is done by passing the current JVM's heap size rather than Long.MaxValue / 2 to create the temporary UnifiedMemoryManager for broadcasting.

This is an optimal setting because if the size we passed is too large, i.e., the current Long.MaxValue / 2, it will cause hanging; if the size is smaller than the current JVM heap size, the OOM might be thrown too early even when there's room in memory for the newly created hashed relation.

Before:

new UnifiedMemoryManager(
    new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),
    Long.MaxValue,
    Long.MaxValue / 2,
    1)

After:

new UnifiedMemoryManager(
    new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),
    Runtime.getRuntime.maxMemory,
    Runtime.getRuntime.maxMemory / 2, 1)

Why are the changes needed?

Report the error fast instead of hanging.

Does this PR introduce any user-facing change?

In some scenarios where large unsafe hashed relations are allocated for broadcast hash join, user will see a meaningful OOM instead of hanging.

Before (hangs):

15:07:38.456 WARN org.apache.spark.memory.TaskMemoryManager: Failed to allocate a page (8589934592 bytes), try again.
15:07:38.501 WARN org.apache.spark.memory.TaskMemoryManager: Failed to allocate a page (8589934592 bytes), try again.
15:07:38.539 WARN org.apache.spark.memory.TaskMemoryManager: Failed to allocate a page (8589934592 bytes), try again.
15:07:38.580 WARN org.apache.spark.memory.TaskMemoryManager: Failed to allocate a page (8589934592 bytes), try again.
15:07:38.613 WARN org.apache.spark.memory.TaskMemoryManager: Failed to allocate a page (8589934592 bytes), try again.
15:07:38.647 WARN org.apache.spark.memory.TaskMemoryManager: Failed to allocate a page (8589934592 bytes), try again.
...

After (OOM):

An exception or error caused a run to abort: [UNABLE_TO_ACQUIRE_MEMORY] Unable to acquire 8589934592 bytes of memory, got 7194909081. SQLSTATE: 53200 
org.apache.spark.memory.SparkOutOfMemoryError: [UNABLE_TO_ACQUIRE_MEMORY] Unable to acquire 8589934592 bytes of memory, got 7194909081. SQLSTATE: 53200
	at org.apache.spark.errors.SparkCoreErrors$.outOfMemoryError(SparkCoreErrors.scala:456)
	at org.apache.spark.errors.SparkCoreErrors.outOfMemoryError(SparkCoreErrors.scala)
	at org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
	at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
	at org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap.java:868)
	at org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:202)
	at org.apache.spark.unsafe.map.BytesToBytesMap.<init>(BytesToBytesMap.java:209)
	at org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelation.scala:464)
	at org.apache.spark.sql.execution.joins.HashedRelationSuite.$anonfun$new$90(HashedRelationSuite.scala:760)

How was this patch tested?

Added tests.

Was this patch authored or co-authored using generative AI tooling?

No.

zhztheplayer · 2025-11-14T19:46:10Z

@@HyukjinKwon @yaooqinn @dongjoon-hyun Thanks.

zhztheplayer · 2025-11-17T16:10:00Z

cc @cloud-fan

cloud-fan · 2025-11-18T15:11:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala

-          Long.MaxValue / 2,
-          1),
+          Runtime.getRuntime.maxMemory,
+          Runtime.getRuntime.maxMemory / 2, 1),


Suggested change

Runtime.getRuntime.maxMemory / 2, 1),

Runtime.getRuntime.maxMemory / 2,

1),

cloud-fan · 2025-11-18T15:11:41Z

sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala

    }
  }
+
+  test("UnsafeHashedRelation should throw OOM when there isn't enough memory") {


why did it hang before?

It's related to a logic introduced in PR #11095. In the PR, the following "retry code" is based on the assumption that JVM heap memory could be slightly smaller than the specified on-heap size in UMM:

https://github.com/davies/spark/blob/7ec7660381f3cd2047658f67b1882fccd83e95e5/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java#L268-L276

Because the code assumes the specified on-heap size in UMM is only finitely larger than the actual JVM heap size, so the call will return as soon as current size + acquiredButNotUsed size reaches the specified heap size limit.

However, we set the on-heap size to an infinite value for broadcast hashed relation:

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala

Lines 142 to 150 in 722bcc0

val mm = Option(taskMemoryManager).getOrElse {

new TaskMemoryManager(

new UnifiedMemoryManager(

new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),

Long.MaxValue,

Long.MaxValue / 2,

1),

0)

}

. So the "retry code" mentioned above will never end until an OOM error or a stack overflow error.

cloud-fan · 2025-11-18T15:12:33Z

sql/core/src/test/scala/org/apache/spark/sql/execution/joins/HashedRelationSuite.scala

+
+  test("UnsafeHashedRelation should throw OOM when there isn't enough memory") {
+    val relations = mutable.ArrayBuffer[HashedRelation]()
+    // We should finally see an OOM thrown since we are keeping allocating hashed relations.


This is a bad test, and will likely to break the CI process. Can we put it in the PR description as a manual test?

Hi @cloud-fan, thanks for reviewing.

This is a bad test, and will likely to break the CI process.

If you meant the OOM error could break the CI, I think we already rely on the similar logic in the production code:

spark/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java

Lines 390 to 403 in dce992b

try {

page = memoryManager.tungstenMemoryAllocator().allocate(acquired);

} catch (OutOfMemoryError e) {

logger.warn("Failed to allocate a page ({} bytes), try again.",

MDC.of(LogKeys.PAGE_SIZE, acquired));

// there is no enough memory actually, it means the actual free memory is smaller than

// MemoryManager thought, we should keep the acquired memory.

synchronized (this) {

acquiredButNotUsed += acquired;

allocatedPages.clear(pageNumber);

}

// this could trigger spilling to free some pages.

return allocatePage(size, consumer);

}

. So I thought it is benign to catch them in testing?

Or is there anything else you are concerned about?

fixup

5735add

github-actions bot added SQL CORE labels Nov 14, 2025

zhztheplayer added 2 commits November 14, 2025 15:22

fixup

43607e1

fixup

bc272e5

github-actions bot removed the CORE label Nov 14, 2025

zhztheplayer added 2 commits November 14, 2025 15:34

fixup

7a6ea21

fixup

69149ab

zhztheplayer changed the title ~~[SPARK-54354][SQL] Spark hangs when there's not enough JVM heap memory for broadcast hashed relation~~ [SPARK-54354][SQL] Fix Spark hanging when there's not enough JVM heap memory for broadcast hashed relation Nov 14, 2025

zhztheplayer mentioned this pull request Nov 17, 2025

[SPARK-54116][SQL] Add off-heap mode support for LongHashedRelation #52817

Open

cloud-fan reviewed Nov 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54354][SQL] Fix Spark hanging when there's not enough JVM heap memory for broadcast hashed relation #53065

[SPARK-54354][SQL] Fix Spark hanging when there's not enough JVM heap memory for broadcast hashed relation #53065

zhztheplayer commented Nov 14, 2025 •

edited

Loading

Uh oh!

zhztheplayer commented Nov 14, 2025

Uh oh!

zhztheplayer commented Nov 17, 2025

Uh oh!

cloud-fan Nov 18, 2025

Uh oh!

cloud-fan Nov 18, 2025

Uh oh!

zhztheplayer Nov 18, 2025

Uh oh!

cloud-fan Nov 18, 2025

Uh oh!

zhztheplayer Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	Runtime.getRuntime.maxMemory / 2, 1),
	Runtime.getRuntime.maxMemory / 2,
	1),

	val mm = Option(taskMemoryManager).getOrElse {
	new TaskMemoryManager(
	new UnifiedMemoryManager(
	new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"),
	Long.MaxValue,
	Long.MaxValue / 2,
	1),
	0)
	}

	try {
	page = memoryManager.tungstenMemoryAllocator().allocate(acquired);
	} catch (OutOfMemoryError e) {
	logger.warn("Failed to allocate a page ({} bytes), try again.",
	MDC.of(LogKeys.PAGE_SIZE, acquired));
	// there is no enough memory actually, it means the actual free memory is smaller than
	// MemoryManager thought, we should keep the acquired memory.
	synchronized (this) {
	acquiredButNotUsed += acquired;
	allocatedPages.clear(pageNumber);
	}
	// this could trigger spilling to free some pages.
	return allocatePage(size, consumer);
	}

[SPARK-54354][SQL] Fix Spark hanging when there's not enough JVM heap memory for broadcast hashed relation #53065

Are you sure you want to change the base?

[SPARK-54354][SQL] Fix Spark hanging when there's not enough JVM heap memory for broadcast hashed relation #53065

Conversation

zhztheplayer commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

zhztheplayer commented Nov 14, 2025

Uh oh!

zhztheplayer commented Nov 17, 2025

Uh oh!

cloud-fan Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

cloud-fan Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

zhztheplayer Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

cloud-fan Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

zhztheplayer Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhztheplayer commented Nov 14, 2025 •

edited

Loading