fix(scheduler): resolve throttling on high-concurrency pod submissions#1723
fix(scheduler): resolve throttling on high-concurrency pod submissions#1723maishivamhoo123 wants to merge 7 commits intoProject-HAMi:masterfrom
Conversation
Signed-off-by: maishivamhoo123 <maishivamhoo@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: maishivamhoo123 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request introduces a per-node in-memory locking mechanism within the scheduler's Bind function to serialize concurrent binding requests and prevent them from triggering exponential backoff. It also adds a blind patch to node annotations for device plugin handshaking and includes a new test case to verify concurrent binding behavior. The review feedback identifies a critical deadlock vulnerability in the AcquireBindLock implementation where a timed-out context leaves a mutex permanently locked. Recommendations include refactoring the lock to use channel-based semaphores, utilizing exported constants for annotation keys, and ensuring the node patch operation respects the defined context timeout.
Codecov Report❌ Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 7 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Signed-off-by: maishivamhoo123 <maishivamhoo@gmail.com>
Signed-off-by: maishivamhoo123 <maishivamhoo@gmail.com>
Signed-off-by: maishivamhoo123 <maishivamhoo@gmail.com>
Signed-off-by: maishivamhoo123 <maishivamhoo@gmail.com>
pkg/scheduler/scheduler.go
Outdated
| return res, nil | ||
| } | ||
|
|
||
| patchData := fmt.Appendf(nil, `{"metadata":{"annotations":{"%s":"%s"}}}`, nodelockutil.NodeLockKey, string(current.UID)) |
There was a problem hiding this comment.
This patch makes the e2e test failed.
It seems the patchData covers the nodelock annotation.
As the nodelock annotation is released in device plugin and the in-memory lock is released after bind, the inconsistency may cause the problem.
Signed-off-by: maishivamhoo123 <maishivamhoo@gmail.com>
Signed-off-by: maishivamhoo123 <maishivamhoo@gmail.com>
Fixes: #1367
Root Cause:
When a high number of pods (e.g., 40+) requested GPUs simultaneously, the HAMi scheduler attempted to lock the node by patching it via the Kubernetes API. The first pod succeeded, but the remaining 39 triggered HTTP 409 API conflicts. This caused the HAMi scheduler to return errors, forcing the kube-scheduler to throw those pods into a 5-minute exponential backoff queue, resulting in severe scheduling delays.
What this PR does:
sync.Mutexin theBind()function to gracefully queue concurrent pod scheduling attempts locally rather than outright rejecting them.types.MergePatchType). This safely applies thehami.io/mutex.lockannotation (which the Device Plugin requires to identify the pod) without triggering 409 resource version conflicts.Testing:
make testandmake verify.