[YUNIKORN-3190] Fix race condition occurring between released and pre… #1058
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…empted allocations
What is this PR for?
Placeholder allocations are already preempted just before release needed for replacement with real allocation. Detect this condition just before doing replacement by acquiring single lock to read "preempted" flag and set "released" to true if earlier flag is false. Otherwise, throw the error and revert the steps carried out from the replacement side.
There could be similar occurrence in the (intra-queue) preemption cycle where victim got released just before kill command. Applied the same (reversed) checks using single lock to halt the preemption process abruptly without killing the other victims if any of the victim is already released.
There could be similar occurrence in the daemon set preemption cycle where victim got released just before kill command. Log the warning message and proceed with other victims.
There could be similar occurrence in the Quota change preemption cycle where victim got released just before kill command. Log the warning message and proceed with other victims.
In addition, some refactoring and clean up.
What type of PR is it?
Todos
What is the Jira issue?
https://issues.apache.org/jira/browse/YUNIKORN-3190
How should this be tested?
Screenshots (if appropriate)
Questions: