Skip to content

Conversation

@jianzhangbjz
Copy link
Member

@jianzhangbjz jianzhangbjz commented Jan 28, 2026

Problem:

During package-server-manager startup, the code attempts to detect if the cluster is SNO (Single Node OpenShift) to use appropriate leader election values. Previously, this used a single 3-second timeout with no retry. If the API server was slow to respond during startup (common in SNO environments), the detection would fail and incorrectly default to HA leader election values.

Solution:

  • Added retry logic using wait.PollUntilContextTimeout that retries every 2 seconds for up to 30 seconds
  • Updated log messages to clarify the intent: detecting SNO cluster topology rather than "getting infrastructure status"
  • Falls back to HA values only after all retries are exhausted (safe default since HA values work on both HA and SNO clusters)

Assissted-By: Claude-Code

@openshift-ci-robot openshift-ci-robot added jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 28, 2026
@openshift-ci-robot
Copy link

@jianzhangbjz: This pull request references Jira Issue OCPBUGS-26404, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Problem:

During package-server-manager startup, the code attempts to detect if the cluster is SNO (Single Node OpenShift) to use appropriate leader election values. Previously, this used a single 3-second timeout with no retry. If the API server was slow to respond during startup (common in SNO environments), the detection would fail and incorrectly default to HA leader election values.

Solution:

  • Added retry logic using wait.PollUntilContextTimeout that retries every 2 seconds for up to 30 seconds
  • Updated log messages to clarify the intent: detecting SNO cluster topology rather than "getting infrastructure status"
  • Falls back to HA values only after all retries are exhausted (safe default since HA values work on both HA and SNO clusters)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 28, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jianzhangbjz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 28, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 28, 2026
@openshift-ci-robot
Copy link

@jianzhangbjz: This pull request references Jira Issue OCPBUGS-26404, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (jiazha@redhat.com), skipping review request.

Details

In response to this:

Problem:

During package-server-manager startup, the code attempts to detect if the cluster is SNO (Single Node OpenShift) to use appropriate leader election values. Previously, this used a single 3-second timeout with no retry. If the API server was slow to respond during startup (common in SNO environments), the detection would fail and incorrectly default to HA leader election values.

Solution:

  • Added retry logic using wait.PollUntilContextTimeout that retries every 2 seconds for up to 30 seconds
  • Updated log messages to clarify the intent: detecting SNO cluster topology rather than "getting infrastructure status"
  • Falls back to HA values only after all retries are exhausted (safe default since HA values work on both HA and SNO clusters)

Assissted-By: Claude-Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jianzhangbjz
Copy link
Member Author

/jira refresh

@openshift-ci-robot
Copy link

@jianzhangbjz: This pull request references Jira Issue OCPBUGS-26404, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (jiazha@redhat.com), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tmshort
Copy link
Contributor

tmshort commented Jan 28, 2026

/retest

@jianzhangbjz
Copy link
Member Author

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 29, 2026

@jianzhangbjz: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jianzhangbjz
Copy link
Member Author

/payload-job periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-baremetal-sno-ipv4-etcd-encryption-rt-kernel-basecap-f7

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 29, 2026

@jianzhangbjz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-openshift-tests-private-release-4.22-amd64-nightly-baremetal-sno-ipv4-etcd-encryption-rt-kernel-basecap-f7

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/df5ada00-fcec-11f0-8c55-0e5f6c8f9482-0

@jianzhangbjz
Copy link
Member Author

Test passed. Details:

1. Build an OCP cluster with this PR
launch 4.22,openshift/operator-framework-olm#1210 aws,single-node

jiazha-mac:openshift-tests-private jiazha$ oc get nodes
NAME                                        STATUS   ROLES                         AGE   VERSION
ip-10-0-72-237.us-west-1.compute.internal   Ready    control-plane,master,worker   80m   v1.34.2

jiazha-mac:~ jiazha$ oc get clusterversion 
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.22.0-0-2026-01-29-085653-test-ci-ln-592429t-latest   True        False         49m     Cluster version is 4.22.0-0-2026-01-29-085653-test-ci-ln-592429t-latest
jiazha-mac:~ jiazha$ 

2. Run test case
jiazha-mac:openshift-tests-private jiazha$ ./bin/extended-platform-tests run all --dry-run |grep 49352|./bin/extended-platform-tests run -f -
  Jan 29 18:30:24.974: INFO: The --provider flag is not set. Continuing as if --provider=skeleton had been used.
started: (0/1/1) "[sig-operators] OLM should NonHyperShiftHOST-Author:jiazha-Medium-49352-SNO Leader election conventions for cluster topology"

  I0129 18:30:41.144053   59607 trace.go:236] Trace[376040953]: "Reflector ListAndWatch" name:k8s.io/client-go@v0.0.0-20230523190412-013d8779845c/tools/cache/reflector.go:231 (29-Jan-2026 18:30:27.043) (total time: 14100ms):
  Trace[376040953]: ---"Objects listed" error:<nil> 14100ms (18:30:41.143)
  Trace[376040953]: [14.100470708s] [14.100470708s] END
  Jan 29 18:30:31.904: INFO: The --provider flag is not set. Continuing as if --provider=skeleton had been used.
  Jan 29 18:30:35.737: INFO: configPath is now "/var/folders/5n/w9ysf4w93jnfy7k19xxct31c0000gn/T/configfile2179051669"
  Jan 29 18:30:35.737: INFO: The user is now "e2e-test-default-vzge2a6h-wh7cg-user"
  Jan 29 18:30:35.737: INFO: Creating project "e2e-test-default-vzge2a6h-wh7cg"
  Jan 29 18:30:36.043: INFO: Waiting on permissions in project "e2e-test-default-vzge2a6h-wh7cg" ...
  Jan 29 18:30:37.735: INFO: Waiting for ServiceAccount "default" to be provisioned...
  Jan 29 18:30:38.364: INFO: Waiting for ServiceAccount "builder" to be provisioned...
  Jan 29 18:30:38.702: INFO: Waiting for ServiceAccount "deployer" to be provisioned...
  Jan 29 18:30:39.762: INFO: Waiting for RoleBinding "system:image-pullers" to be provisioned...
  Jan 29 18:30:40.573: INFO: Waiting for RoleBinding "system:image-builders" to be provisioned...
  Jan 29 18:30:41.269: INFO: Waiting for RoleBinding "system:deployers" to be provisioned...
  Jan 29 18:30:41.737: INFO: Project "e2e-test-default-vzge2a6h-wh7cg" has been fully provisioned.
  STEP: 1) get the cluster topology 01/29/26 18:30:41.738
  Jan 29 18:30:41.739: INFO: Running 'oc --kubeconfig=/Users/jiazha/bot-kubeconfig get infrastructures cluster -o=jsonpath={.status.controlPlaneTopology}'
  STEP: 2) get the leaseDurationSeconds of the packageserver-controller-lock 01/29/26 18:30:43.898
  Jan 29 18:30:43.898: INFO: Running 'oc --kubeconfig=/Users/jiazha/bot-kubeconfig get lease packageserver-controller-lock -n openshift-operator-lifecycle-manager -o=jsonpath={.spec.leaseDurationSeconds}'
  Jan 29 18:30:45.739: INFO: This is a SNO cluster
  Jan 29 18:30:45.977: INFO: Deleted {user.openshift.io/v1, Resource=users  e2e-test-default-vzge2a6h-wh7cg-user}, err: <nil>
  Jan 29 18:30:46.212: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients  e2e-client-e2e-test-default-vzge2a6h-wh7cg}, err: <nil>
  Jan 29 18:30:46.445: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens  sha256~Ul-l_XyKvmxfoWKwvhqzhKOY58puto7qa3eZkYqgIQg}, err: <nil>

passed: (20.3s) 2026-01-29T10:30:47 "[sig-operators] OLM should NonHyperShiftHOST-Author:jiazha-Medium-49352-SNO Leader election conventions for cluster topology"

1 pass, 0 skip (20.3s)

@jianzhangbjz
Copy link
Member Author

/lgtm
/verified by @jianzhangbjz

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 29, 2026

@jianzhangbjz: you cannot LGTM your own PR.

Details

In response to this:

/lgtm
/verified by @jianzhangbjz

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jan 29, 2026
@openshift-ci-robot
Copy link

@jianzhangbjz: This PR has been marked as verified by @jianzhangbjz.

Details

In response to this:

/lgtm
/verified by @jianzhangbjz

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants