Skip to content

Conversation

@bradbehle
Copy link
Contributor

The ovnkube-control-plane pods that run on a hosted control plane overwrite the cpu and memory resource requests if they are ever changed, so changing them to improve control plane performance does not work. Any customizations to these deployment's resource requests are overwritten by the cluster-network-operator.

This commit changes that so customizations/changes are left in place, to match the behavior of the multus-admission-controller. For reference, the PR that implemented this for multus-admission-controller is #2335

The ovnkube-control-plane pods that run on a hosted control plane overwrite
the cpu and memory resource requests if they are ever changed, so changing
them to improve control plane performance does not work.  Any customizations
to these deployment's resource requests are overwritten by the
cluster-network-operator.

This commit changes that so customizations/changes are left in place, to
match the behavior of the multus-admission-controller.  For reference, the
PR that implemented this for multus-admission-controller is
openshift#2335
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 27, 2025
@openshift-ci-robot
Copy link
Contributor

@bradbehle: This pull request references CORENET-6488 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

The ovnkube-control-plane pods that run on a hosted control plane overwrite the cpu and memory resource requests if they are ever changed, so changing them to improve control plane performance does not work. Any customizations to these deployment's resource requests are overwritten by the cluster-network-operator.

This commit changes that so customizations/changes are left in place, to match the behavior of the multus-admission-controller. For reference, the PR that implemented this for multus-admission-controller is #2335

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from arghosh93 and kyrtapz October 27, 2025 03:14
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 27, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2025

Hi @bradbehle. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@TwoDCube
Copy link
Member

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 27, 2025
@rtheis
Copy link

rtheis commented Oct 27, 2025

/retest
/ok-to-test

Copy link

@rtheis rtheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2025

@rtheis: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@rtheis
Copy link

rtheis commented Oct 27, 2025

/retest
/ok-to-test

3 similar comments
@rtheis
Copy link

rtheis commented Oct 28, 2025

/retest
/ok-to-test

@rtheis
Copy link

rtheis commented Oct 29, 2025

/retest
/ok-to-test

@rtheis
Copy link

rtheis commented Oct 30, 2025

/retest
/ok-to-test

@rtheis
Copy link

rtheis commented Nov 3, 2025

/ok-to-test
/retest-required

@kyrtapz
Copy link
Contributor

kyrtapz commented Nov 3, 2025

@csrwng how does resource requests preservation works in HyperShift? What happens if a component wants to change their default requests during an upgrade?

@rtheis
Copy link

rtheis commented Nov 6, 2025

/cc @csrwng

@openshift-ci openshift-ci bot requested a review from csrwng November 6, 2025 14:26
@csrwng
Copy link
Contributor

csrwng commented Nov 6, 2025

how does resource requests preservation works in HyperShift? What happens if a component wants to change their default requests during an upgrade?

@kyrtapz we simply don't update the resource requests. So if we change the default, the default will apply to new control planes, but not to existing ones. Admittedly, this is less than ideal, but the right fix for it is not necessarily to come up with some way of updating them. Whatever we update them to, will likely be wrong because it won't necessarily match your usage. For a while we've said that we want to update resource requests based on actual usage and optionally allow the user to manage them entirely. We just need to get to it :)

@kyrtapz
Copy link
Contributor

kyrtapz commented Nov 7, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 7, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 7, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bradbehle, kyrtapz, rtheis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 7, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 7, 2025

Walkthrough

This pull request introduces configurable per-container resource requests for OVN HyperShift deployments. Resource request values are templated in the manifest with defaults and populated at runtime by discovering current resource requests from deployed containers in the cluster.

Changes

Cohort / File(s) Summary
YAML Manifest Templating
bindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yaml
Replaced fixed CPU and memory request values with Go template variables for three containers: Token Minter (10m/30Mi defaults), OVN control-plane (10m/200Mi defaults), and Socks5 proxy (10m/10Mi defaults).
Type Structure Extension
pkg/bootstrap/types.go
Added six new string fields to OVNHyperShiftBootstrapResult: TokenMinterResourceRequestCPU, TokenMinterResourceRequestMemory, OVNControlPlaneResourceRequestCPU, OVNControlPlaneResourceRequestMemory, Socks5ProxyResourceRequestCPU, and Socks5ProxyResourceRequestMemory.
Resource Request Discovery Logic
pkg/network/ovn_kubernetes.go
Implemented resource request population by introducing getResourceRequestsForDeployment helper function to fetch per-container CPU/Memory requests from deployments, then populating the bootstrap result fields with discovered values for token-minter, ovnkube-control-plane, and socks-proxy containers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • getResourceRequestsForDeployment helper function: Verify correct deployment fetching, container resource extraction, and unit conversion (milli-cores and MiB) logic
  • String conversion accuracy: Ensure resource request values are converted to string format matching template expectations
  • Container name and namespace resolution: Confirm correct deployment and container names are targeted for each resource request lookup
  • Error handling: Check how missing deployments, containers, or resource requests are handled to ensure graceful defaults or appropriate failures
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@kyrtapz
Copy link
Contributor

kyrtapz commented Nov 7, 2025

/test e2e-aws-ovn-hypershift-conformance

@kyrtapz
Copy link
Contributor

kyrtapz commented Nov 7, 2025

@bradbehle @rtheis please add the /verified by @user label, that should be the last thing needed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between c563eb4 and d1f2484.

📒 Files selected for processing (3)
  • bindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yaml (3 hunks)
  • pkg/bootstrap/types.go (1 hunks)
  • pkg/network/ovn_kubernetes.go (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • bindata/network/ovn-kubernetes/managed/ovnkube-control-plane.yaml
  • pkg/bootstrap/types.go
  • pkg/network/ovn_kubernetes.go

Comment on lines +788 to +817
// getResourceRequestsForDeployment gets the cpu and memory resource requests for the specified deployment
// If the deployment or container is not found, or if the container doesn't have a cpu or memory resource request, then 0 is returned
func getResourceRequestsForDeployment(cl crclient.Reader, namespace string, deploymentName string, containerName string) (cpu int64, memory int64) {
deployment := &appsv1.Deployment{}
if err := cl.Get(context.TODO(), types.NamespacedName{
Namespace: namespace,
Name: deploymentName,
}, deployment); err != nil {
if !apierrors.IsNotFound(err) {
klog.Warningf("Error fetching %s deployment: %v", deploymentName, err)
}
return cpu, memory
}

for _, container := range deployment.Spec.Template.Spec.Containers {
if container.Name == containerName {
if container.Resources.Requests != nil {
if !container.Resources.Requests.Cpu().IsZero() {
cpu = container.Resources.Requests.Cpu().MilliValue()
}
if !container.Resources.Requests.Memory().IsZero() {
memory = container.Resources.Requests.Memory().Value() / bytesInMiB
}
}
break
}
}

return cpu, memory
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't truncate preserved memory requests

getResourceRequestsForDeployment divides the memory quantity by bytesInMiB, so any request that was set with decimal SI units (e.g. 500M) comes back as 476Mi. When the operator re-renders the deployment it silently lowers the request, defeating the goal of preserving administrator overrides and risking regressions for workloads that relied on the exact value.

Please carry the full resource.Quantity string (which already canonicalizes units) instead of converting to bare Mi integers, and drop the hard-coded Mi suffix in the template to accept the full value. A minimal sketch:

-func getResourceRequestsForDeployment(...) (cpu int64, memory int64) {
+func getResourceRequestsForDeployment(...) (cpu, memory string) {
@@
-	if err := cl.Get(...); err != nil { ... }
+	if err := cl.Get(...); err != nil { ... }
@@
-		if container.Name == containerName {
-			if container.Resources.Requests != nil {
-				if !container.Resources.Requests.Cpu().IsZero() {
-					cpu = container.Resources.Requests.Cpu().MilliValue()
-				}
-				if !container.Resources.Requests.Memory().IsZero() {
-					memory = container.Resources.Requests.Memory().Value() / bytesInMiB
-				}
-			}
+		if container.Name == containerName && container.Resources.Requests != nil {
+			if cpuQty := container.Resources.Requests.Cpu(); cpuQty != nil && !cpuQty.IsZero() {
+				cpu = cpuQty.String()
+			}
+			if memQty := container.Resources.Requests.Memory(); memQty != nil && !memQty.IsZero() {
+				memory = memQty.String()
+			}
 		}

Then you can assign the struct fields directly (no strconv.FormatInt) and render them with defaults like {{ .TokenMinterResourceRequestMemory | default "30Mi" }}. This keeps every user-specified value intact.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In pkg/network/ovn_kubernetes.go around lines 788-817, the function currently
converts memory to an int64 Mi value which truncates user-specified decimal SI
units; change the function to preserve the full resource.Quantity strings
instead: update the signature to return (cpu string, memory string) (or
resource.Quantity strings), set cpu =
container.Resources.Requests.Cpu().String() and memory =
container.Resources.Requests.Memory().String() (remove any division by
bytesInMiB and IsZero checks should still guard nil), and update all callers to
accept string values; also remove the hard-coded "Mi" suffix in the deployment
template and render the returned value directly (using template default like {{
.TokenMinterResourceRequestMemory | default "30Mi" }}).

@rtheis
Copy link

rtheis commented Nov 7, 2025

/verified by @bradbehle

@openshift-ci-robot
Copy link
Contributor

@rtheis: Jira verification commands are restricted to collaborators for this repo.

In response to this:

/verified by @bradbehle

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@bradbehle
Copy link
Contributor Author

/verified by @bradbehle

@openshift-ci-robot
Copy link
Contributor

@bradbehle: Jira verification commands are restricted to collaborators for this repo.

In response to this:

/verified by @bradbehle

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rtheis
Copy link

rtheis commented Nov 11, 2025

@csrwng can you please add /verified by @bradbehle for us?

@kyrtapz
Copy link
Contributor

kyrtapz commented Nov 12, 2025

/verified by @bradbehle

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Nov 12, 2025
@openshift-ci-robot
Copy link
Contributor

@kyrtapz: This PR has been marked as verified by @bradbehle.

In response to this:

/verified by @bradbehle

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@kyrtapz
Copy link
Contributor

kyrtapz commented Nov 12, 2025

Sorry for the churn @bradbehle @rtheis!
Should be good now.

@rtheis
Copy link

rtheis commented Nov 12, 2025

Thank you @kyrtapz

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 05d6f46 and 2 for PR HEAD d1f2484 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 12, 2025

@bradbehle: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.21-upgrade-from-stable-4.20-e2e-azure-ovn-upgrade d1f2484 link false /test 4.21-upgrade-from-stable-4.20-e2e-azure-ovn-upgrade
ci/prow/security d1f2484 link false /test security
ci/prow/4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-upgrade d1f2484 link false /test 4.21-upgrade-from-stable-4.20-e2e-gcp-ovn-upgrade
ci/prow/4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade d1f2484 link false /test 4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@bradbehle
Copy link
Contributor Author

/retest-required

@openshift-merge-bot openshift-merge-bot bot merged commit d1321fa into openshift:master Nov 13, 2025
25 of 29 checks passed
@rtheis
Copy link

rtheis commented Nov 13, 2025

/cherry-pick release-4.20

@openshift-cherrypick-robot

@rtheis: new pull request created: #2835

In response to this:

/cherry-pick release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants