CORENET-6488: Preserve custom resource requests on ovn-control-plane pods #2825

bradbehle · 2025-10-27T03:14:11Z

The ovnkube-control-plane pods that run on a hosted control plane overwrite the cpu and memory resource requests if they are ever changed, so changing them to improve control plane performance does not work. Any customizations to these deployment's resource requests are overwritten by the cluster-network-operator.

This commit changes that so customizations/changes are left in place, to match the behavior of the multus-admission-controller. For reference, the PR that implemented this for multus-admission-controller is #2335

The ovnkube-control-plane pods that run on a hosted control plane overwrite the cpu and memory resource requests if they are ever changed, so changing them to improve control plane performance does not work. Any customizations to these deployment's resource requests are overwritten by the cluster-network-operator. This commit changes that so customizations/changes are left in place, to match the behavior of the multus-admission-controller. For reference, the PR that implemented this for multus-admission-controller is openshift#2335

openshift-ci-robot · 2025-10-27T03:14:15Z

@bradbehle: This pull request references CORENET-6488 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

The ovnkube-control-plane pods that run on a hosted control plane overwrite the cpu and memory resource requests if they are ever changed, so changing them to improve control plane performance does not work. Any customizations to these deployment's resource requests are overwritten by the cluster-network-operator.

This commit changes that so customizations/changes are left in place, to match the behavior of the multus-admission-controller. For reference, the PR that implemented this for multus-admission-controller is #2335

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-10-27T03:14:44Z

Hi @bradbehle. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

TwoDCube · 2025-10-27T03:18:56Z

/ok-to-test

rtheis · 2025-10-27T11:14:03Z

/retest
/ok-to-test

rtheis

/lgtm

openshift-ci · 2025-10-27T11:59:50Z

@rtheis: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

rtheis · 2025-10-27T15:47:09Z

/retest
/ok-to-test

rtheis · 2025-10-28T12:50:00Z

/retest
/ok-to-test

rtheis · 2025-10-29T10:31:40Z

/retest
/ok-to-test

rtheis · 2025-10-30T16:43:55Z

/retest
/ok-to-test

rtheis · 2025-11-03T13:17:32Z

/ok-to-test
/retest-required

kyrtapz · 2025-11-03T13:19:52Z

@csrwng how does resource requests preservation works in HyperShift? What happens if a component wants to change their default requests during an upgrade?

rtheis · 2025-11-06T14:25:59Z

/cc @csrwng

csrwng · 2025-11-06T19:05:24Z

how does resource requests preservation works in HyperShift? What happens if a component wants to change their default requests during an upgrade?

@kyrtapz we simply don't update the resource requests. So if we change the default, the default will apply to new control planes, but not to existing ones. Admittedly, this is less than ideal, but the right fix for it is not necessarily to come up with some way of updating them. Whatever we update them to, will likely be wrong because it won't necessarily match your usage. For a while we've said that we want to update resource requests based on actual usage and optionally allow the user to manage them entirely. We just need to get to it :)

kyrtapz · 2025-11-07T08:37:05Z

/lgtm

openshift-ci · 2025-11-07T08:38:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bradbehle, kyrtapz, rtheis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [kyrtapz]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2025-11-07T08:38:50Z

Walkthrough

This pull request introduces configurable per-container resource requests for OVN HyperShift deployments. Resource request values are templated in the manifest with defaults and populated at runtime by discovering current resource requests from deployed containers in the cluster.

Changes

Cohort / File(s)	Summary
YAML Manifest Templating `bindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yaml`	Replaced fixed CPU and memory request values with Go template variables for three containers: Token Minter (10m/30Mi defaults), OVN control-plane (10m/200Mi defaults), and Socks5 proxy (10m/10Mi defaults).
Type Structure Extension `pkg/bootstrap/types.go`	Added six new string fields to `OVNHyperShiftBootstrapResult`: `TokenMinterResourceRequestCPU`, `TokenMinterResourceRequestMemory`, `OVNControlPlaneResourceRequestCPU`, `OVNControlPlaneResourceRequestMemory`, `Socks5ProxyResourceRequestCPU`, and `Socks5ProxyResourceRequestMemory`.
Resource Request Discovery Logic `pkg/network/ovn_kubernetes.go`	Implemented resource request population by introducing `getResourceRequestsForDeployment` helper function to fetch per-container CPU/Memory requests from deployments, then populating the bootstrap result fields with discovered values for token-minter, ovnkube-control-plane, and socks-proxy containers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

getResourceRequestsForDeployment helper function: Verify correct deployment fetching, container resource extraction, and unit conversion (milli-cores and MiB) logic
String conversion accuracy: Ensure resource request values are converted to string format matching template expectations
Container name and namespace resolution: Confirm correct deployment and container names are targeted for each resource request lookup
Error handling: Check how missing deployments, containers, or resource requests are handled to ensure graceful defaults or appropriate failures

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kyrtapz · 2025-11-07T08:41:09Z

/test e2e-aws-ovn-hypershift-conformance

kyrtapz · 2025-11-07T08:42:17Z

@bradbehle @rtheis please add the /verified by @user label, that should be the last thing needed.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between c563eb4 and d1f2484.

📒 Files selected for processing (3)

bindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yaml (3 hunks)
pkg/bootstrap/types.go (1 hunks)
pkg/network/ovn_kubernetes.go (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

bindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yaml
pkg/bootstrap/types.go
pkg/network/ovn_kubernetes.go

coderabbitai · 2025-11-07T08:46:13Z

pkg/network/ovn_kubernetes.go

+// getResourceRequestsForDeployment gets the cpu and memory resource requests for the specified deployment
+// If the deployment or container is not found, or if the container doesn't have a cpu or memory resource request, then 0 is returned
+func getResourceRequestsForDeployment(cl crclient.Reader, namespace string, deploymentName string, containerName string) (cpu int64, memory int64) {
+	deployment := &appsv1.Deployment{}
+	if err := cl.Get(context.TODO(), types.NamespacedName{
+		Namespace: namespace,
+		Name:      deploymentName,
+	}, deployment); err != nil {
+		if !apierrors.IsNotFound(err) {
+			klog.Warningf("Error fetching %s deployment: %v", deploymentName, err)
+		}
+		return cpu, memory
+	}
+
+	for _, container := range deployment.Spec.Template.Spec.Containers {
+		if container.Name == containerName {
+			if container.Resources.Requests != nil {
+				if !container.Resources.Requests.Cpu().IsZero() {
+					cpu = container.Resources.Requests.Cpu().MilliValue()
+				}
+				if !container.Resources.Requests.Memory().IsZero() {
+					memory = container.Resources.Requests.Memory().Value() / bytesInMiB
+				}
+			}
+			break
+		}
+	}
+
+	return cpu, memory
+}


⚠️ Potential issue | 🟠 Major

Don't truncate preserved memory requests

getResourceRequestsForDeployment divides the memory quantity by bytesInMiB, so any request that was set with decimal SI units (e.g. 500M) comes back as 476Mi. When the operator re-renders the deployment it silently lowers the request, defeating the goal of preserving administrator overrides and risking regressions for workloads that relied on the exact value.

Please carry the full resource.Quantity string (which already canonicalizes units) instead of converting to bare Mi integers, and drop the hard-coded Mi suffix in the template to accept the full value. A minimal sketch:

-func getResourceRequestsForDeployment(...) (cpu int64, memory int64) { +func getResourceRequestsForDeployment(...) (cpu, memory string) { @@ - if err := cl.Get(...); err != nil { ... } + if err := cl.Get(...); err != nil { ... } @@ - if container.Name == containerName { - if container.Resources.Requests != nil { - if !container.Resources.Requests.Cpu().IsZero() { - cpu = container.Resources.Requests.Cpu().MilliValue() - } - if !container.Resources.Requests.Memory().IsZero() { - memory = container.Resources.Requests.Memory().Value() / bytesInMiB - } - } + if container.Name == containerName && container.Resources.Requests != nil { + if cpuQty := container.Resources.Requests.Cpu(); cpuQty != nil && !cpuQty.IsZero() { + cpu = cpuQty.String() + } + if memQty := container.Resources.Requests.Memory(); memQty != nil && !memQty.IsZero() { + memory = memQty.String() + } }

Then you can assign the struct fields directly (no strconv.FormatInt) and render them with defaults like {{ .TokenMinterResourceRequestMemory | default "30Mi" }}. This keeps every user-specified value intact.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In pkg/network/ovn_kubernetes.go around lines 788-817, the function currently converts memory to an int64 Mi value which truncates user-specified decimal SI units; change the function to preserve the full resource.Quantity strings instead: update the signature to return (cpu string, memory string) (or resource.Quantity strings), set cpu = container.Resources.Requests.Cpu().String() and memory = container.Resources.Requests.Memory().String() (remove any division by bytesInMiB and IsZero checks should still guard nil), and update all callers to accept string values; also remove the hard-coded "Mi" suffix in the deployment template and render the returned value directly (using template default like {{ .TokenMinterResourceRequestMemory | default "30Mi" }}).

rtheis · 2025-11-07T12:03:08Z

/verified by @bradbehle

openshift-ci-robot · 2025-11-07T12:03:18Z

@rtheis: Jira verification commands are restricted to collaborators for this repo.

In response to this:

/verified by @bradbehle

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

bradbehle · 2025-11-07T23:05:09Z

/verified by @bradbehle

openshift-ci-robot · 2025-11-07T23:05:19Z

@bradbehle: Jira verification commands are restricted to collaborators for this repo.

In response to this:

/verified by @bradbehle

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

rtheis · 2025-11-11T12:24:06Z

@csrwng can you please add /verified by @bradbehle for us?

kyrtapz · 2025-11-12T11:40:50Z

/verified by @bradbehle

openshift-ci-robot · 2025-11-12T11:41:02Z

@kyrtapz: This PR has been marked as verified by @bradbehle.

In response to this:

/verified by @bradbehle

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

kyrtapz · 2025-11-12T11:41:15Z

Sorry for the churn @bradbehle @rtheis!
Should be good now.

rtheis · 2025-11-12T11:43:43Z

Thank you @kyrtapz

openshift-ci-robot · 2025-11-12T12:01:22Z

/retest-required

Remaining retests: 0 against base HEAD 05d6f46 and 2 for PR HEAD d1f2484 in total

openshift-ci · 2025-11-12T20:08:16Z

@bradbehle: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/4.21-upgrade-from-stable-4.20-e2e-azure-ovn-upgrade	`d1f2484`	link	false	`/test 4.21-upgrade-from-stable-4.20-e2e-azure-ovn-upgrade`
ci/prow/security	`d1f2484`	link	false	`/test security`
ci/prow/4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-upgrade	`d1f2484`	link	false	`/test 4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-upgrade`
ci/prow/4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade	`d1f2484`	link	false	`/test 4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

bradbehle · 2025-11-13T06:39:03Z

/retest-required

rtheis · 2025-11-13T11:00:21Z

/cherry-pick release-4.20

openshift-cherrypick-robot · 2025-11-13T11:01:11Z

@rtheis: new pull request created: #2835

In response to this:

/cherry-pick release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 27, 2025

openshift-ci bot requested review from arghosh93 and kyrtapz October 27, 2025 03:14

openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 27, 2025

openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 27, 2025

rtheis approved these changes Oct 27, 2025

View reviewed changes

openshift-ci bot requested a review from csrwng November 6, 2025 14:26

openshift-ci bot assigned kyrtapz Nov 7, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 7, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 7, 2025

coderabbitai bot reviewed Nov 7, 2025

View reviewed changes

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Nov 12, 2025

openshift-merge-bot bot merged commit d1321fa into openshift:master Nov 13, 2025
25 of 29 checks passed

openshift-cherrypick-robot mentioned this pull request Nov 13, 2025

[release-4.20] CORENET-6488: Preserve custom resource requests on ovn-control-plane pods #2835

Open

CORENET-6488: Preserve custom resource requests on ovn-control-plane pods #2825

CORENET-6488: Preserve custom resource requests on ovn-control-plane pods #2825

Uh oh!

Conversation

bradbehle commented Oct 27, 2025

Uh oh!

openshift-ci-robot commented Oct 27, 2025

Uh oh!

openshift-ci bot commented Oct 27, 2025

Uh oh!

TwoDCube commented Oct 27, 2025

Uh oh!

rtheis commented Oct 27, 2025

Uh oh!

rtheis left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Oct 27, 2025

Uh oh!

rtheis commented Oct 27, 2025

Uh oh!

rtheis commented Oct 28, 2025

Uh oh!

rtheis commented Oct 29, 2025

Uh oh!

rtheis commented Oct 30, 2025

Uh oh!

rtheis commented Nov 3, 2025

Uh oh!

kyrtapz commented Nov 3, 2025

Uh oh!

rtheis commented Nov 6, 2025

Uh oh!

csrwng commented Nov 6, 2025

Uh oh!

kyrtapz commented Nov 7, 2025

Uh oh!

openshift-ci bot commented Nov 7, 2025

Uh oh!

coderabbitai bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

kyrtapz commented Nov 7, 2025

Uh oh!

kyrtapz commented Nov 7, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

rtheis commented Nov 7, 2025

Uh oh!

openshift-ci-robot commented Nov 7, 2025

Uh oh!

bradbehle commented Nov 7, 2025

Uh oh!

openshift-ci-robot commented Nov 7, 2025

Uh oh!

rtheis commented Nov 11, 2025

Uh oh!

kyrtapz commented Nov 12, 2025

Uh oh!

openshift-ci-robot commented Nov 12, 2025

Uh oh!

kyrtapz commented Nov 12, 2025

Uh oh!

rtheis commented Nov 12, 2025

Uh oh!

openshift-ci-robot commented Nov 12, 2025

Uh oh!

openshift-ci bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bradbehle commented Nov 13, 2025

Uh oh!

coderabbitai bot commented Nov 7, 2025 •

edited

Loading

openshift-ci bot commented Nov 12, 2025 •

edited

Loading