-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[KEP-4816] Simple scoring for DRA Prioritized List feature #5633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -91,6 +91,7 @@ tags, and then generate with `hack/update-toc.sh`. | |
- [Risks and Mitigations](#risks-and-mitigations) | ||
- [Design Details](#design-details) | ||
- [Scheduler Implementation](#scheduler-implementation) | ||
- [Scoring](#scoring) | ||
- [Test Plan](#test-plan) | ||
- [Prerequisite testing updates](#prerequisite-testing-updates) | ||
- [Unit tests](#unit-tests) | ||
|
@@ -290,15 +291,13 @@ type `DeviceSubRequest`. The `DeviceSubRequest` type is similar to | |
available when providing multiple alternatives. The list provided in the | ||
`FirstAvailable` field is considered a priority order, such that the | ||
scheduler will use the first entry in the list that satisfies the | ||
requirements. | ||
requirements. | ||
|
||
DRA does not yet implement scoring, which means that | ||
the selected devices might not be optimal. For example, if a prioritized | ||
list is provided in a request, DRA might choose entry number two on node A, | ||
even though entry number one would meet the requirements on node B. This is | ||
consistent with current behavior in DRA where the first match will be chosen. | ||
Scoring is something that will be implemented later, with early discussions | ||
in https://github.com/kubernetes/enhancements/issues/4970 | ||
DRA does not yet implement full scoring (tracked in | ||
https://github.com/kubernetes/enhancements/issues/4970), but we will implement | ||
a limited form of scoring for this feature. This is to make sure nodes which | ||
can satisfy a claim with higher ranked subrequests are preferred over others. The | ||
details are described in the [Scoring](#scoring) section. | ||
|
||
### User Stories | ||
|
||
|
@@ -619,13 +618,11 @@ type DeviceRequest struct { | |
// | ||
// This field may only be set in the entries of DeviceClaim.Requests. | ||
// | ||
// DRA does not yet implement scoring, so the scheduler will | ||
// select the first set of devices that satisfies all the | ||
// requests in the claim. And if the requirements can | ||
// be satisfied on more than one node, other scheduling features | ||
// will determine which node is chosen. This means that the set of | ||
// devices allocated to a claim might not be the optimal set | ||
// available to the cluster. Scoring will be implemented later. | ||
// DRA does not yet implement full scoring, but it implements limited | ||
// scoring so that nodes that can satisfy high ranked subrequests are | ||
// preferred over others. The node ultimately chosen also depends on | ||
// other scheduling features, so it is not guaranteed that the node | ||
// preferred by DRA is chosen. | ||
// | ||
// +optional | ||
// +oneOf=deviceRequestType | ||
|
@@ -899,6 +896,35 @@ would need a higher score, which currently is planned for beta of this feature. | |
For alpha, the scheduler may still pick a node with a less preferred device, if | ||
there are nodes with each type of device available. | ||
|
||
#### Scoring | ||
|
||
Full support for scoring in DRA is not in scope for this feature, but we will | ||
implement limited scoring to make sure that nodes which can satisfy claims with | ||
higher ranked subrequests are preferred over others. | ||
|
||
We will implement this by letting the dynamicresources scheduler plugin implement | ||
the `Score` and `NormalizeScore` interfaces. | ||
|
||
The allocation result for each node will be given a score based on the ranking of | ||
the chosen subrequests across all requests using the `FirstAvailable` field across | ||
all claims referenced by the Pod. Since the number of subrequests for each request | ||
is capped at 8, we will compute a score between 1 and 8 for each request, with 8 | ||
being the best (i.e. the first option was chosen) and 1 if the 8th subrequest was | ||
chosen. If there are more than one request using the `FirstAvailable` field the score from | ||
all of them will be added up to get the score for the pod on the node. | ||
Since | ||
the score for every node is computed based on the same claims, we end up with a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it make sense to explain how the total node score is computed if pod requests multiple claims/devices? Is it a sum of scores for each claim, weighted sum, average etc? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added a sentence about this. I think we should just do a sum, since all scores for a single pod will have the same claims. And we will do normalization anyway to make sure the score falls within the allowed boundaries in the scheduling framework. |
||
ranking of the results from all nodes. | ||
|
||
We will implement the `NormalizeScore` interface to normalize the results. We aim | ||
to do this in a way such that the worst selection possible (where the last subrequest | ||
is chosen for all requests specifying the `FirstAvailable` field) is given a score of | ||
0, and the best selection possible (where the first subrequest is chosen for all requests | ||
specifying the `FirstAvailable` field) is given a score of 100. This makes sure that the | ||
score accurately reflects "how much better" one alternative is compared to another. | ||
|
||
We will give the plugin a weight of 2 since it reflects scoring based on user preference. | ||
|
||
### Test Plan | ||
|
||
<!-- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linear ranking might not match user intent. Would it make sense to use exponential ranking here, giving more priority to the nodes with higher ranked devices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting that we should have something like that the lowest ranked option gets a score of 1, then 2, 4, 8, 16, ...? I did think about other ways to do ranking, but none seemed clearly better than linear ranking. As an example, if I have a claim with two requests, each with three subrequests, would an allocation where the first subrequest gets allocated on the first request and the third on the second request be better than the second on both? I think linear has be benefit that it is pretty easy to understand and reason about.