Skip to content

Commit f596079

Browse files
committed
[SPARK-49385][K8S] Fix getReusablePVCs to use podCreationTimeout instead of podAllocationDelay
### What changes were proposed in this pull request? This PR aims to use `podCreationTimeout` instead of `podAllocationDelay` when `getReusablePVCs` excludes the newly created PVCs of previous batches. ### Why are the changes needed? K8s control plane pod creation can be delayed due to the unknown reasons. So, `podAllocationDelay (default: 1s)` is insufficient to say that the previous allocation batch's pods are created with their PVCs. We had better wait until `podCreationTimeout`. ### Does this PR introduce _any_ user-facing change? This affects only the initial set of executors because the baseline is PVC's `getCreationTimestamp`. So, this fixes only a buggy situation where a PVC is shared by two executors due to the long pending executor pod. ### How was this patch tested? Pass the CIs with newly updated test cases. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47867 from dongjoon-hyun/SPARK-49385. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
1 parent 5706942 commit f596079

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -430,7 +430,7 @@ class ExecutorPodsAllocator(
430430
val reusablePVCs = createdPVCs
431431
.filterNot(pvc => pvcsInUse.contains(pvc.getMetadata.getName))
432432
.filter(pvc => now - Instant.parse(pvc.getMetadata.getCreationTimestamp).toEpochMilli
433-
> podAllocationDelay)
433+
> podCreationTimeout)
434434
logInfo(log"Found ${MDC(LogKeys.COUNT, reusablePVCs.size)} reusable PVCs from " +
435435
log"${MDC(LogKeys.TOTAL, createdPVCs.size)} PVCs")
436436
reusablePVCs

resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocatorSuite.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -771,7 +771,7 @@ class ExecutorPodsAllocatorSuite extends SparkFunSuite with BeforeAndAfter {
771771

772772
val pvc = persistentVolumeClaim("pvc-0", "gp2", "200Gi")
773773
pvc.getMetadata
774-
.setCreationTimestamp(Instant.now().minus(podAllocationDelay + 1, MILLIS).toString)
774+
.setCreationTimestamp(Instant.now().minus(podCreationTimeout + 1, MILLIS).toString)
775775
when(persistentVolumeClaimList.getItems).thenReturn(Seq(pvc).asJava)
776776
when(executorBuilder.buildFromFeatures(any(classOf[KubernetesExecutorConf]), meq(secMgr),
777777
meq(kubernetesClient), any(classOf[ResourceProfile])))
@@ -849,7 +849,7 @@ class ExecutorPodsAllocatorSuite extends SparkFunSuite with BeforeAndAfter {
849849
val pvc2 = persistentVolumeClaim("pvc-2", "gp2", "200Gi")
850850

851851
val now = Instant.now()
852-
pvc1.getMetadata.setCreationTimestamp(now.minus(2 * podAllocationDelay, MILLIS).toString)
852+
pvc1.getMetadata.setCreationTimestamp(now.minus(podCreationTimeout + 1, MILLIS).toString)
853853
pvc2.getMetadata.setCreationTimestamp(now.toString)
854854

855855
when(persistentVolumeClaimList.getItems).thenReturn(Seq(pvc1, pvc2).asJava)

0 commit comments

Comments
 (0)