Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 41 additions & 15 deletions keps/sig-scheduling/4816-dra-prioritized-list/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ tags, and then generate with `hack/update-toc.sh`.
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Scheduler Implementation](#scheduler-implementation)
- [Scoring](#scoring)
- [Test Plan](#test-plan)
- [Prerequisite testing updates](#prerequisite-testing-updates)
- [Unit tests](#unit-tests)
Expand Down Expand Up @@ -290,15 +291,13 @@ type `DeviceSubRequest`. The `DeviceSubRequest` type is similar to
available when providing multiple alternatives. The list provided in the
`FirstAvailable` field is considered a priority order, such that the
scheduler will use the first entry in the list that satisfies the
requirements.
requirements.

DRA does not yet implement scoring, which means that
the selected devices might not be optimal. For example, if a prioritized
list is provided in a request, DRA might choose entry number two on node A,
even though entry number one would meet the requirements on node B. This is
consistent with current behavior in DRA where the first match will be chosen.
Scoring is something that will be implemented later, with early discussions
in https://github.com/kubernetes/enhancements/issues/4970
DRA does not yet implement full scoring (tracked in
https://github.com/kubernetes/enhancements/issues/4970), but we will implement
a limited form of scoring for this feature. This is to make sure nodes which
can satisfy a claim with higher ranked subrequests are preferred over others. The
details are described in the [Scoring](#scoring) section.

### User Stories

Expand Down Expand Up @@ -619,13 +618,11 @@ type DeviceRequest struct {
//
// This field may only be set in the entries of DeviceClaim.Requests.
//
// DRA does not yet implement scoring, so the scheduler will
// select the first set of devices that satisfies all the
// requests in the claim. And if the requirements can
// be satisfied on more than one node, other scheduling features
// will determine which node is chosen. This means that the set of
// devices allocated to a claim might not be the optimal set
// available to the cluster. Scoring will be implemented later.
// DRA does not yet implement full scoring, but it implements limited
// scoring so that nodes that can satisfy high ranked subrequests are
// preferred over others. The node ultimately chosen also depends on
// other scheduling features, so it is not guaranteed that the node
// preferred by DRA is chosen.
//
// +optional
// +oneOf=deviceRequestType
Expand Down Expand Up @@ -899,6 +896,35 @@ would need a higher score, which currently is planned for beta of this feature.
For alpha, the scheduler may still pick a node with a less preferred device, if
there are nodes with each type of device available.

#### Scoring

Full support for scoring in DRA is not in scope for this feature, but we will
implement limited scoring to make sure that nodes which can satisfy claims with
higher ranked subrequests are preferred over others.

We will implement this by letting the dynamicresources scheduler plugin implement
the `Score` and `NormalizeScore` interfaces.

The allocation result for each node will be given a score based on the ranking of
the chosen subrequests across all requests using the `FirstAvailable` field across
all claims referenced by the Pod. Since the number of subrequests for each request
is capped at 8, we will compute a score between 1 and 8 for each request, with 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linear ranking might not match user intent. Would it make sense to use exponential ranking here, giving more priority to the nodes with higher ranked devices?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that we should have something like that the lowest ranked option gets a score of 1, then 2, 4, 8, 16, ...? I did think about other ways to do ranking, but none seemed clearly better than linear ranking. As an example, if I have a claim with two requests, each with three subrequests, would an allocation where the first subrequest gets allocated on the first request and the third on the second request be better than the second on both? I think linear has be benefit that it is pretty easy to understand and reason about.

being the best (i.e. the first option was chosen) and 1 if the 8th subrequest was
chosen. If there are more than one request using the `FirstAvailable` field the score from
all of them will be added up to get the score for the pod on the node.
Since
the score for every node is computed based on the same claims, we end up with a
Copy link
Contributor

@bart0sh bart0sh Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to explain how the total node score is computed if pod requests multiple claims/devices? Is it a sum of scores for each claim, weighted sum, average etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a sentence about this. I think we should just do a sum, since all scores for a single pod will have the same claims. And we will do normalization anyway to make sure the score falls within the allowed boundaries in the scheduling framework.

ranking of the results from all nodes.

We will implement the `NormalizeScore` interface to normalize the results. We aim
to do this in a way such that the worst selection possible (where the last subrequest
is chosen for all requests specifying the `FirstAvailable` field) is given a score of
0, and the best selection possible (where the first subrequest is chosen for all requests
specifying the `FirstAvailable` field) is given a score of 100. This makes sure that the
score accurately reflects "how much better" one alternative is compared to another.

We will give the plugin a weight of 2 since it reflects scoring based on user preference.

### Test Plan

<!--
Expand Down
3 changes: 1 addition & 2 deletions keps/sig-scheduling/4816-dra-prioritized-list/kep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,12 @@ stage: beta
# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.34"
latest-milestone: "v1.35"

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v1.33"
beta: "v1.34"
stable: "v1.35"

# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
Expand Down