-
Notifications
You must be signed in to change notification settings - Fork 264
CORENET-6488: Preserve custom resource requests on ovn-control-plane pods #2825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORENET-6488: Preserve custom resource requests on ovn-control-plane pods #2825
Conversation
The ovnkube-control-plane pods that run on a hosted control plane overwrite the cpu and memory resource requests if they are ever changed, so changing them to improve control plane performance does not work. Any customizations to these deployment's resource requests are overwritten by the cluster-network-operator. This commit changes that so customizations/changes are left in place, to match the behavior of the multus-admission-controller. For reference, the PR that implemented this for multus-admission-controller is openshift#2335
|
@bradbehle: This pull request references CORENET-6488 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Hi @bradbehle. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
|
/retest |
rtheis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
@rtheis: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
3 similar comments
|
/retest |
|
/retest |
|
/retest |
|
/ok-to-test |
|
@csrwng how does resource requests preservation works in HyperShift? What happens if a component wants to change their default requests during an upgrade? |
|
/cc @csrwng |
@kyrtapz we simply don't update the resource requests. So if we change the default, the default will apply to new control planes, but not to existing ones. Admittedly, this is less than ideal, but the right fix for it is not necessarily to come up with some way of updating them. Whatever we update them to, will likely be wrong because it won't necessarily match your usage. For a while we've said that we want to update resource requests based on actual usage and optionally allow the user to manage them entirely. We just need to get to it :) |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bradbehle, kyrtapz, rtheis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
WalkthroughThis pull request introduces configurable per-container resource requests for OVN HyperShift deployments. Resource request values are templated in the manifest with defaults and populated at runtime by discovering current resource requests from deployed containers in the cluster. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
|
/test e2e-aws-ovn-hypershift-conformance |
|
@bradbehle @rtheis please add the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
bindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yaml(3 hunks)pkg/bootstrap/types.go(1 hunks)pkg/network/ovn_kubernetes.go(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
bindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yamlpkg/bootstrap/types.gopkg/network/ovn_kubernetes.go
| // getResourceRequestsForDeployment gets the cpu and memory resource requests for the specified deployment | ||
| // If the deployment or container is not found, or if the container doesn't have a cpu or memory resource request, then 0 is returned | ||
| func getResourceRequestsForDeployment(cl crclient.Reader, namespace string, deploymentName string, containerName string) (cpu int64, memory int64) { | ||
| deployment := &appsv1.Deployment{} | ||
| if err := cl.Get(context.TODO(), types.NamespacedName{ | ||
| Namespace: namespace, | ||
| Name: deploymentName, | ||
| }, deployment); err != nil { | ||
| if !apierrors.IsNotFound(err) { | ||
| klog.Warningf("Error fetching %s deployment: %v", deploymentName, err) | ||
| } | ||
| return cpu, memory | ||
| } | ||
|
|
||
| for _, container := range deployment.Spec.Template.Spec.Containers { | ||
| if container.Name == containerName { | ||
| if container.Resources.Requests != nil { | ||
| if !container.Resources.Requests.Cpu().IsZero() { | ||
| cpu = container.Resources.Requests.Cpu().MilliValue() | ||
| } | ||
| if !container.Resources.Requests.Memory().IsZero() { | ||
| memory = container.Resources.Requests.Memory().Value() / bytesInMiB | ||
| } | ||
| } | ||
| break | ||
| } | ||
| } | ||
|
|
||
| return cpu, memory | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't truncate preserved memory requests
getResourceRequestsForDeployment divides the memory quantity by bytesInMiB, so any request that was set with decimal SI units (e.g. 500M) comes back as 476Mi. When the operator re-renders the deployment it silently lowers the request, defeating the goal of preserving administrator overrides and risking regressions for workloads that relied on the exact value.
Please carry the full resource.Quantity string (which already canonicalizes units) instead of converting to bare Mi integers, and drop the hard-coded Mi suffix in the template to accept the full value. A minimal sketch:
-func getResourceRequestsForDeployment(...) (cpu int64, memory int64) {
+func getResourceRequestsForDeployment(...) (cpu, memory string) {
@@
- if err := cl.Get(...); err != nil { ... }
+ if err := cl.Get(...); err != nil { ... }
@@
- if container.Name == containerName {
- if container.Resources.Requests != nil {
- if !container.Resources.Requests.Cpu().IsZero() {
- cpu = container.Resources.Requests.Cpu().MilliValue()
- }
- if !container.Resources.Requests.Memory().IsZero() {
- memory = container.Resources.Requests.Memory().Value() / bytesInMiB
- }
- }
+ if container.Name == containerName && container.Resources.Requests != nil {
+ if cpuQty := container.Resources.Requests.Cpu(); cpuQty != nil && !cpuQty.IsZero() {
+ cpu = cpuQty.String()
+ }
+ if memQty := container.Resources.Requests.Memory(); memQty != nil && !memQty.IsZero() {
+ memory = memQty.String()
+ }
}Then you can assign the struct fields directly (no strconv.FormatInt) and render them with defaults like {{ .TokenMinterResourceRequestMemory | default "30Mi" }}. This keeps every user-specified value intact.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In pkg/network/ovn_kubernetes.go around lines 788-817, the function currently
converts memory to an int64 Mi value which truncates user-specified decimal SI
units; change the function to preserve the full resource.Quantity strings
instead: update the signature to return (cpu string, memory string) (or
resource.Quantity strings), set cpu =
container.Resources.Requests.Cpu().String() and memory =
container.Resources.Requests.Memory().String() (remove any division by
bytesInMiB and IsZero checks should still guard nil), and update all callers to
accept string values; also remove the hard-coded "Mi" suffix in the deployment
template and render the returned value directly (using template default like {{
.TokenMinterResourceRequestMemory | default "30Mi" }}).
|
/verified by @bradbehle |
|
@rtheis: Jira verification commands are restricted to collaborators for this repo. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/verified by @bradbehle |
|
@bradbehle: Jira verification commands are restricted to collaborators for this repo. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@csrwng can you please add |
|
/verified by @bradbehle |
|
@kyrtapz: This PR has been marked as verified by In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Sorry for the churn @bradbehle @rtheis! |
|
Thank you @kyrtapz |
|
@bradbehle: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest-required |
d1321fa
into
openshift:master
|
/cherry-pick release-4.20 |
|
@rtheis: new pull request created: #2835 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The ovnkube-control-plane pods that run on a hosted control plane overwrite the cpu and memory resource requests if they are ever changed, so changing them to improve control plane performance does not work. Any customizations to these deployment's resource requests are overwritten by the cluster-network-operator.
This commit changes that so customizations/changes are left in place, to match the behavior of the multus-admission-controller. For reference, the PR that implemented this for multus-admission-controller is #2335