Skip to content

Conversation

@jaypoulz
Copy link
Contributor

@jaypoulz jaypoulz commented Nov 6, 2025

  • Updated warning against baremetal platform including BMC block
  • Updated test section to note that we'll skip requirements criteria if no requirements are provided
  • Added a new block that explains the PacemakerCluster API, the status collector, and the health check controller

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 6, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joelanford for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -317,6 +317,9 @@ In the future, it may be possible to lower the privilege level of the TNF contro
to run without root privileges. We are working with the RHEL-HA team to identify the specific set of commands that we use to narrow the scope of progress towards this goal. This remains a long-term
objective for both teams.

##### The PacemakerCluster Health Check
See [Status Propogation with PacemakerCluster Health Check](#status-propogation-with-pacemakercluster-health-check)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: propagation (somewhere else in the doc too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that word is hard. :)

need to do so. An example of this would be if the cluster administrator rotated their BMC password without updating the fencing secret in the cluster. This would be caught by the pacemaker monitoring
checks, but something in the cluster would need to aware of that to propogate that information to the user directly.

To acheive this, we plan on using a pair of new controllers in CEO. The first is a status collector, which syncs every 30 seconds to gather that current state of pacemaker via `sudo pcs status xml`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, some spelling:
To achieve this, we plan on using two new controllers in CEO. The first is a status collector which syncs every 30 seconds to gather the current state of pacemaker via sudo pcs status xml

@@ -708,6 +665,35 @@ This collection of diagrams collects a series of scenarios where both nodes fail

![Diagrams of Multi-Node Failure Scenarios](etcd-flowchart-both-nodes-reboot-scenarios.svg)

#### Status Propogation with PacemakerCluster Health Check
An important goal of Two Node OpenShift with Fencing is ensuring that we always warn the user before a disaster event can occur that would require manually intervention if we have the information we
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is really hard to parse. Maybe something like:
An important goal of Two Node OpenShift with Fencing is ensuring an early warning for potentially disastrous events (that would requiring manual intervention). To provide this warning, the information needs to be available within the cluster.

- A list of nodes currently registered in pacemaker
- A list of recent events recorded by the pacemaker resources
- A list of recent fencing events performed by pacemaker
- A dump of the full pacemaker XML. This is kept so that in the case that the XML API is changed in a way that breaks the other fields, we can quickly deliver a fix for the breakage that parses the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wording: This is kept to be able to deliver a quick fix to the XML parsing code if the XML API is changed in such a way that other fields break

- A dump of the full pacemaker XML. This is kept so that in the case that the XML API is changed in a way that breaks the other fields, we can quickly deliver a fix for the breakage that parses the
XML directly.

Once the PacemakerCluster object is populated is it handled on the CEO side by a new pacemaker healthcheck controller. This controller evaluates the status of the report and creates events in CEO for the following things:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it -> it is

- Not all resources and nodes are in their expected / healthy state
- The pacemakercluster status object is stale (hasn't been updated in the last 5 minutes)

Overall these health checks are almost entirely informational. The only time they are used outside of this event creation or operator status is to ensure that the nodes recorded in pacemaker match the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove "this" in "this event"

- A list of nodes currently registered in pacemaker
- A list of recent events recorded by the pacemaker resources
- A list of recent fencing events performed by pacemaker
- A dump of the full pacemaker XML. This is kept so that in the case that the XML API is changed in a way that breaks the other fields, we can quickly deliver a fix for the breakage that parses the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel like the right place to put this. This to me at least is not really a field a user would care about. Who is this API for?

- A dump of the full pacemaker XML. This is kept so that in the case that the XML API is changed in a way that breaks the other fields, we can quickly deliver a fix for the breakage that parses the
XML directly.

Once the PacemakerCluster object is populated is it handled on the CEO side by a new pacemaker healthcheck controller. This controller evaluates the status of the report and creates events in CEO for the following things:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can the writer of PacemakerCluster not produce the events? Seems that component has all of the correct/relevant information to be able to write the events?

- Warnings for fencing events that have happened on the cluster

More importantly, it also sets the CEO's status to degraded if one of the following conditions are true:
- Not all resources and nodes are in their expected / healthy state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct for CEO to go degraded? I thought I saw kubelet listed? Wouldn't some other component be responsible for alerting when a kubelet on a control plane node is down? Doesn't really feel like a CEO issue to report?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most other components that relying on multiple replicas will be degraded at the same time. The obvious one is API server. In fact, CEO already reports degraded when kubelet is down because it doesn't have all of the endpoints it thinks it's supposed to have (one per control-plane node).

The reason we include reporting the kubelet behavior in the pacemaker status is because pacemaker ensures that kubelet is started before etcd. That means, that for etcd to be healthy, kubelet must be healthy. We could ignore the state of the kubelet resource when reporting the state of pacemaker, but as I mentioned before, the etcd member controller is going to be reporting degraded anyway so it's just extra information that explains why pacemaker is unhealthy.


More importantly, it also sets the CEO's status to degraded if one of the following conditions are true:
- Not all resources and nodes are in their expected / healthy state
- The pacemakercluster status object is stale (hasn't been updated in the last 5 minutes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs admin intervention in a fairly prompt manner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know for sure. We can only give admins instructions if we know the state of pacemaker. If we haven't received a status, this means that CEO's status collector cronjob has stopped posting them or what's being posted is being rejected by the API.

In either case, the cluster could be in a state where the cluster could fail without recovering automatically. The goal is to raise this in a way where the cluster admin knows that something could be wrong.

- Updated warning against baremetal platform including BMC block
- Updated test section to note that we'll skip requirements criteria if no requirements are provided
- Added a new block that explains the PacemakerCluster API, the status collector, and the health check controller
@jaypoulz jaypoulz changed the title Updated TNF EP to address some drift from original requirements. OCPEDGE-2215: Updated TNF EP to address some drift from original requirements. Nov 6, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 6, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 6, 2025

@jaypoulz: This pull request references OCPEDGE-2215 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

  • Updated warning against baremetal platform including BMC block
  • Updated test section to note that we'll skip requirements criteria if no requirements are provided
  • Added a new block that explains the PacemakerCluster API, the status collector, and the health check controller

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 6, 2025

@jaypoulz: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint 3320626 link true /test markdownlint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

would be caught by the pacemaker monitoring checks, but something in the cluster needs to propagate that information to the user directly.

To acheive this, we plan on using two new controllers in CEO. The first is a status collector which syncs every 30 seconds to gather that current state of pacemaker via `sudo pcs status xml`.
This is parsed and populates a new status object called a `PacemakerCluster`, which is a singleton resource that created by CEO when the transition to an external etcd is completed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some typos in thi sentece, which could flow better like so:

This is parsed and populates by a new status object called a PacemakerCluster, which is a singleton resource that created by CEO when the transition to an external etcd is completed.

- A list of recent fencing events performed by pacemaker
- A dump of the full pacemaker XML. This is kept to be able to deliver a quick fix to the XML parsing code if the XML API is changed in such a way that other fields break.

Once the PacemakerCluster object is populated it is handled on the CEO side by a new pacemaker healthcheck controller. This controller evaluates the status of the report and creates events in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence seems a bit wordy. What about the following:

Once populated, the PacemakerCluster object is handled ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants