-
Notifications
You must be signed in to change notification settings - Fork 78
OCPBUGS-26404: Add retry logic for SNO cluster detection in leader election configuration #1210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@jianzhangbjz: This pull request references Jira Issue OCPBUGS-26404, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jianzhangbjz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@jianzhangbjz: This pull request references Jira Issue OCPBUGS-26404, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (jiazha@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@jianzhangbjz: This pull request references Jira Issue OCPBUGS-26404, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (jiazha@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
/retest-required |
|
@jianzhangbjz: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/payload-job periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-baremetal-sno-ipv4-etcd-encryption-rt-kernel-basecap-f7 |
|
@jianzhangbjz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/df5ada00-fcec-11f0-8c55-0e5f6c8f9482-0 |
|
Test passed. Details: 1. Build an OCP cluster with this PR
launch 4.22,openshift/operator-framework-olm#1210 aws,single-node
jiazha-mac:openshift-tests-private jiazha$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-72-237.us-west-1.compute.internal Ready control-plane,master,worker 80m v1.34.2
jiazha-mac:~ jiazha$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.22.0-0-2026-01-29-085653-test-ci-ln-592429t-latest True False 49m Cluster version is 4.22.0-0-2026-01-29-085653-test-ci-ln-592429t-latest
jiazha-mac:~ jiazha$
2. Run test case
jiazha-mac:openshift-tests-private jiazha$ ./bin/extended-platform-tests run all --dry-run |grep 49352|./bin/extended-platform-tests run -f -
Jan 29 18:30:24.974: INFO: The --provider flag is not set. Continuing as if --provider=skeleton had been used.
started: (0/1/1) "[sig-operators] OLM should NonHyperShiftHOST-Author:jiazha-Medium-49352-SNO Leader election conventions for cluster topology"
I0129 18:30:41.144053 59607 trace.go:236] Trace[376040953]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.0.0-20230523190412-013d8779845c/tools/cache/reflector.go:231 (29-Jan-2026 18:30:27.043) (total time: 14100ms):
Trace[376040953]: ---"Objects listed" error:<nil> 14100ms (18:30:41.143)
Trace[376040953]: [14.100470708s] [14.100470708s] END
Jan 29 18:30:31.904: INFO: The --provider flag is not set. Continuing as if --provider=skeleton had been used.
Jan 29 18:30:35.737: INFO: configPath is now "/var/folders/5n/w9ysf4w93jnfy7k19xxct31c0000gn/T/configfile2179051669"
Jan 29 18:30:35.737: INFO: The user is now "e2e-test-default-vzge2a6h-wh7cg-user"
Jan 29 18:30:35.737: INFO: Creating project "e2e-test-default-vzge2a6h-wh7cg"
Jan 29 18:30:36.043: INFO: Waiting on permissions in project "e2e-test-default-vzge2a6h-wh7cg" ...
Jan 29 18:30:37.735: INFO: Waiting for ServiceAccount "default" to be provisioned...
Jan 29 18:30:38.364: INFO: Waiting for ServiceAccount "builder" to be provisioned...
Jan 29 18:30:38.702: INFO: Waiting for ServiceAccount "deployer" to be provisioned...
Jan 29 18:30:39.762: INFO: Waiting for RoleBinding "system:image-pullers" to be provisioned...
Jan 29 18:30:40.573: INFO: Waiting for RoleBinding "system:image-builders" to be provisioned...
Jan 29 18:30:41.269: INFO: Waiting for RoleBinding "system:deployers" to be provisioned...
Jan 29 18:30:41.737: INFO: Project "e2e-test-default-vzge2a6h-wh7cg" has been fully provisioned.
STEP: 1) get the cluster topology 01/29/26 18:30:41.738
Jan 29 18:30:41.739: INFO: Running 'oc --kubeconfig=/Users/jiazha/bot-kubeconfig get infrastructures cluster -o=jsonpath={.status.controlPlaneTopology}'
STEP: 2) get the leaseDurationSeconds of the packageserver-controller-lock 01/29/26 18:30:43.898
Jan 29 18:30:43.898: INFO: Running 'oc --kubeconfig=/Users/jiazha/bot-kubeconfig get lease packageserver-controller-lock -n openshift-operator-lifecycle-manager -o=jsonpath={.spec.leaseDurationSeconds}'
Jan 29 18:30:45.739: INFO: This is a SNO cluster
Jan 29 18:30:45.977: INFO: Deleted {user.openshift.io/v1, Resource=users e2e-test-default-vzge2a6h-wh7cg-user}, err: <nil>
Jan 29 18:30:46.212: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients e2e-client-e2e-test-default-vzge2a6h-wh7cg}, err: <nil>
Jan 29 18:30:46.445: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens sha256~Ul-l_XyKvmxfoWKwvhqzhKOY58puto7qa3eZkYqgIQg}, err: <nil>
passed: (20.3s) 2026-01-29T10:30:47 "[sig-operators] OLM should NonHyperShiftHOST-Author:jiazha-Medium-49352-SNO Leader election conventions for cluster topology"
1 pass, 0 skip (20.3s) |
|
/lgtm |
|
@jianzhangbjz: you cannot LGTM your own PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@jianzhangbjz: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Problem:
During package-server-manager startup, the code attempts to detect if the cluster is SNO (Single Node OpenShift) to use appropriate leader election values. Previously, this used a single 3-second timeout with no retry. If the API server was slow to respond during startup (common in SNO environments), the detection would fail and incorrectly default to HA leader election values.
Solution:
Assissted-By: Claude-Code