-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-54354][SQL] Fix Spark hanging when there's not enough JVM heap memory for broadcast hashed relation #53065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@@HyukjinKwon @yaooqinn @dongjoon-hyun Thanks. |
|
cc @cloud-fan |
| Long.MaxValue / 2, | ||
| 1), | ||
| Runtime.getRuntime.maxMemory, | ||
| Runtime.getRuntime.maxMemory / 2, 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Runtime.getRuntime.maxMemory / 2, 1), | |
| Runtime.getRuntime.maxMemory / 2, | |
| 1), |
| } | ||
| } | ||
|
|
||
| test("UnsafeHashedRelation should throw OOM when there isn't enough memory") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did it hang before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's related to a logic introduced in PR #11095. In the PR, the following "retry code" is based on the assumption that JVM heap memory could be slightly smaller than the specified on-heap size in UMM:
Because the code assumes the specified on-heap size in UMM is only finitely larger than the actual JVM heap size, so the call will return as soon as current size + acquiredButNotUsed size reaches the specified heap size limit.
However, we set the on-heap size to an infinite value for broadcast hashed relation:
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
Lines 142 to 150 in 722bcc0
| val mm = Option(taskMemoryManager).getOrElse { | |
| new TaskMemoryManager( | |
| new UnifiedMemoryManager( | |
| new SparkConf().set(MEMORY_OFFHEAP_ENABLED.key, "false"), | |
| Long.MaxValue, | |
| Long.MaxValue / 2, | |
| 1), | |
| 0) | |
| } |
|
|
||
| test("UnsafeHashedRelation should throw OOM when there isn't enough memory") { | ||
| val relations = mutable.ArrayBuffer[HashedRelation]() | ||
| // We should finally see an OOM thrown since we are keeping allocating hashed relations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bad test, and will likely to break the CI process. Can we put it in the PR description as a manual test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cloud-fan, thanks for reviewing.
This is a bad test, and will likely to break the CI process.
If you meant the OOM error could break the CI, I think we already rely on the similar logic in the production code:
spark/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
Lines 390 to 403 in dce992b
| try { | |
| page = memoryManager.tungstenMemoryAllocator().allocate(acquired); | |
| } catch (OutOfMemoryError e) { | |
| logger.warn("Failed to allocate a page ({} bytes), try again.", | |
| MDC.of(LogKeys.PAGE_SIZE, acquired)); | |
| // there is no enough memory actually, it means the actual free memory is smaller than | |
| // MemoryManager thought, we should keep the acquired memory. | |
| synchronized (this) { | |
| acquiredButNotUsed += acquired; | |
| allocatedPages.clear(pageNumber); | |
| } | |
| // this could trigger spilling to free some pages. | |
| return allocatePage(size, consumer); | |
| } |
Or is there anything else you are concerned about?
What changes were proposed in this pull request?
A fix to let Spark throw OOM rather than hang when there's not enough JVM heap memory for broadcast hashed relation. The fix is done by passing the current JVM's heap size rather than
Long.MaxValue / 2to create the temporaryUnifiedMemoryManagerfor broadcasting.This is an optimal setting because if the size we passed is too large, i.e., the current
Long.MaxValue / 2, it will cause hanging; if the size is smaller than the current JVM heap size, the OOM might be thrown too early even when there's room in memory for the newly created hashed relation.Before:
After:
Why are the changes needed?
Report the error fast instead of hanging.
Does this PR introduce any user-facing change?
In some scenarios where large unsafe hashed relations are allocated for broadcast hash join, user will see a meaningful OOM instead of hanging.
Before (hangs):
After (OOM):
How was this patch tested?
Added tests.
Was this patch authored or co-authored using generative AI tooling?
No.