Skip to content

Conversation

@nrwahl2
Copy link
Contributor

@nrwahl2 nrwahl2 commented Nov 13, 2025

I'm opening this as an alternative to the "Fix: pacemaker-attrd: wipe CIB along with memory" commit in #3955. I have not done any testing. It's already late at night and I just finished this prototype. I don't have the time and energy to look into the testing steps and such.

This is so simple that I feel like we must have already tried something like this, and there must be a reason why it can't work. But I have to put it out there for discussion. I would love to find a less confusing approach than the NULL trick we're using in #3955.

If this works (a huge "if"), then it might even take care of the rolling upgrade issue, since this approach doesn't depend on a CIB update notification to tell us that it's safe to erase values from attrd memory.

We should no longer need the attrd_for_cib() commit. However, we would still need the controller commits from #3955, which I did not include in this PR.


I didn't add a commit message beyond the one-line header, and there are a couple more places where we could benefit from comments/Doxygen. I don't want to invest too much effort since this approach might fail miserably.

@nrwahl2 nrwahl2 requested a review from clumens November 13, 2025 10:18
@nrwahl2 nrwahl2 changed the title DEMO: Fix: pacemaker-attrd: wipe CIB along with memory DEMO: Fix: pacemaker-attrd: Wipe CIB along with memory Nov 13, 2025
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Nov 13, 2025

Added controller commits from #3955

@nrwahl2 nrwahl2 force-pushed the nrwahl2-attrd branch 2 times, most recently from 2e132cc to 3a10d98 Compare November 13, 2025 19:49
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Nov 13, 2025

  • Rebased to fix conflicts
  • Fixed NULL hash table issue

@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Nov 13, 2025

This PR at least seems to fix the issue in T139. When the controller dies and respawns, the attributes are kept in both the CIB and the attribute manager.

T138 looks harder to test, and in fact I don't know how to.

T137 does not track a bug, but rather the removal of code that's made redundant by the fix for T138/T139. This PR resolves T137 via the controller commits, but we still need to test whether everything works as expected during rolling upgrades.

@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Nov 13, 2025

Fixed yet another NULL hash table issue :/

@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Nov 13, 2025

T138 looks harder to test, and in fact I don't know how to.

I don't know how to test the scenarios described in the task description. However, the linked issue RHEL-23082 says the cluster shutdown hangs when deleting the fence devices and stopping the cluster. I have been unable to reproduce this issue with the newly built package.

BUT, I have also been unable to reproduce that issue on main.

@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Nov 13, 2025

I performed a rolling upgrade test from the shipped pacemaker packages to the package from this PR, following the steps in #3955 (comment)). My nodes are:

  • fastvm-fedora42-22 (node1, gets upgraded)
  • fastvm-fedora42-23 (node2, does not get upgraded)
  • fastvm-fedora42-24 (node3, DC, does not get upgraded).

At the end of the test

  • node2 is offline.
  • node2's <transient_attributes> section has been erased from the CIB.
  • node1 and node3 have the same empty value for the attribute on node2:
[root@fastvm-fedora42-22 ~]# pacemakerd -F
Pacemaker 3.0.1-1.f3a5cb4015.git.fc42 (Build: f3a5cb4015)
 Supporting v3.20.5: agent-manpages books cibsecrets corosync-ge-2 default-resource-stickiness generated-manpages lsb monotonic ncurses service systemd
[root@fastvm-fedora42-22 ~]# attrd_updater -n XYZ --query -N fastvm-fedora42-23
name="XYZ" host="fastvm-fedora42-23" value=""
...
[root@fastvm-fedora42-24 ~]# pacemakerd -F
Pacemaker 3.0.1-11.fc42 (Build: 18db266)
 Supporting v3.20.1: agent-manpages books cibsecrets corosync-ge-2 default-resource-stickiness default-sbd-sync generated-manpages lsb monotonic ncurses service systemd
[root@fastvm-fedora42-24 ~]# attrd_updater -n XYZ --query -N fastvm-fedora42-23
name="XYZ" host="fastvm-fedora42-23" value=""

nrwahl2 and others added 10 commits November 13, 2025 19:05
Previously, when the attribute manager purged a node, it would purge the
node's transient attributes only from memory, and assumed the controller
would purge them from the CIB. Now, the writer will purge them from the
CIB as well.

This fixes a variety of timing issues when multiple nodes including the
attribute writer are shutting down. If the writer leaves before some
other node, the DC wipes that other node's attributes from the CIB when
that other node leaves the controller process group (or all other nodes
do if the DC is the leaving node). If a new writer (possibly even the
node itself) is elected before the node's attribute manager leaves the
cluster layer, it will write the attributes back to the CIB. Once the
other node leaves the cluster layer, all attribute managers remove its
attributes from memory, but they are now "stuck" in the CIB.

As of this commit, the controller still erases the attributes from the
CIB when the node leaves the controller process group, which is
redundant but doesn't cause any new problems. This will be corrected in
an upcoming commit.

Note: This will cause an insignificant regression if backported to
Pacemaker 2. The Pacemaker 2 controller purges attributes from the CIB
for leaving DCs only if they are at version 1.1.13 or later, because
earlier DCs will otherwise get fenced after a clean shutdown. Since the
attribute manager doesn't know the DC or its version, the attributes
would now always be wiped, so old leaving DCs will get fenced. The
fencing would occur only in the highly unlikely situation of a rolling
upgrade from Pacemaker 2-supported versions 1.1.11 or 1.1.12, and the
upgrade would still succeed without any negative impact on resources.

Fixes T138

Co-Authored-By: Ken Gaillot <kgaillot@redhat.com>
Co-Authored-By: Chris Lumens <clumens@redhat.com>

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
The requesting_shutdown variable was checked only by
attrd_shutting_down(), when the if_requested argument was set to true.
In that case, it returned true if either the shutting_down variable was
true or both the if_requested argument and the requesting_shutdown
variable were true.

The only caller that passed if_requested=true was
attrd_cib_updated_cb(). It did this if:
a. the alerts section was changed, or
b. the status section or nodes section was changed by an untrusted
  client.

Details:
a. Prior to f42e170, we didn't pass if_requested=true for an alerts
   section change. We started doing so as of that commit mostly for
   convenience. We decided that it seemed reasonable to ignore alert
   changes when there was a shutdown pending.

   This commit reverts to NOT ignoring alert changes due to pending
   shutdown. That seems like it might be better. I'm not sure if it's
   possible for us to land in attrd_send_attribute_alert() while a
   shutdown is requested but has not begun. If so, it would be good to
   send the correct alerts.

b. The other call with true is to avoid writing out all attributes when
   the status or nodes section changes. It's probably okay to drop the
   true there too. It was added by a1a9c54, to resolve a race condition
   where:
   * node2 left.
   * node1's controller deleted node2's transient attributes from the
     CIB.
   * node1 took over as DC and replaced the CIB.
   * node2's attribute manager was not yet actually shutting down, and
     it responded to the CIB replacement by writing out all of the
     attributes that were in its memory, including its own "shutdown"
     attribute.

   Now (as of the previous commit), node1's attribute manager would
   delete this "shutdown" attribute as part of its shutdown process. (Or
   more accurately, I think the attribute writer node will do that.)

   So if we understand correctly, the attrd_shutting_down(true)
   workaround is no longer needed.

With no more callers needing to pass true, the supporting code can go
away.

Co-Authored-By: Reid Wahl <nrwahl@protonmail.com>
Now that the attribute manager will erase transient attributes from the
CIB when purging a node, we don't need to do that separately in the
controller.

Co-Authored-By: Chris Lumens <clumens@redhat.com>
With recent changes, the attribute manager now handles it when the node
leaves the cluster, so the controller purge is redundant.

This does alter the timing somewhat, since the controller's purge
occurred when the node left the controller process group, while the
attribute manager's purge occurs when it leaves the cluster, but that
shouldn't make a significant difference.

This fixes a problem when a node's controller crashes and is respawned
while fencing is disabled. Previously, another node's controller would
remove that node's transient attributes from the CIB, but they would
remain in the attribute managers' memory. Now, the attributes are
correctly retained in the CIB in this situation.

Fixes T137
Fixes T139

Co-Authored-By: Chris Lumens <clumens@redhat.com>
...instead of wiping from the CIB directly.

Co-Authored-By: Chris Lumens <clumens@redhat.com>
It now boils down to a bool for whether we want only unlocked resources.
...to controld_delete_node_history(), and
controld_node_state_deletion_strings() to
controld_node_history_deletion_strings(), since they delete only history
now.
This has only ever had two values, which basically just means it's a
bool.
@nrwahl2 nrwahl2 marked this pull request as ready for review November 14, 2025 03:06
@nrwahl2 nrwahl2 changed the title DEMO: Fix: pacemaker-attrd: Wipe CIB along with memory Erase transient attributes in attribute manager instead of controller Nov 14, 2025
@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Nov 14, 2025

@clumens This is ready for review as far as I'm concerned. If it works, then I'm pretty happy with it.

As I said in previous comments, T137 and T139 appear to be addressed, and rolling upgrades appear to work as expected. I don't know how to test T138. I'd love to get your input on that, and to have you test it as well to feel more confident.

@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Nov 14, 2025

A 50-iteration cts-lab run completed with no errors.

@clumens
Copy link
Contributor

clumens commented Nov 14, 2025

I performed a rolling upgrade test from the shipped pacemaker packages to the package from this PR, following the steps in #3955 (comment)). My nodes are:

* `fastvm-fedora42-22` (node1, gets upgraded)

* `fastvm-fedora42-23` (node2, does not get upgraded)

* `fastvm-fedora42-24` (node3, DC, does not get upgraded).

At the end of the test

* node2 is offline.

* node2's `<transient_attributes>` section has been erased from the CIB.

* node1 and node3 have the same empty value for the attribute on node2:
[root@fastvm-fedora42-22 ~]# pacemakerd -F
Pacemaker 3.0.1-1.f3a5cb4015.git.fc42 (Build: f3a5cb4015)
 Supporting v3.20.5: agent-manpages books cibsecrets corosync-ge-2 default-resource-stickiness generated-manpages lsb monotonic ncurses service systemd
[root@fastvm-fedora42-22 ~]# attrd_updater -n XYZ --query -N fastvm-fedora42-23
name="XYZ" host="fastvm-fedora42-23" value=""
...
[root@fastvm-fedora42-24 ~]# pacemakerd -F
Pacemaker 3.0.1-11.fc42 (Build: 18db266)
 Supporting v3.20.1: agent-manpages books cibsecrets corosync-ge-2 default-resource-stickiness default-sbd-sync generated-manpages lsb monotonic ncurses service systemd
[root@fastvm-fedora42-24 ~]# attrd_updater -n XYZ --query -N fastvm-fedora42-23
name="XYZ" host="fastvm-fedora42-23" value=""

The desired behavior here is that the attribute is missing on node1, not that it has an empty value. That's the whole point of the last patch in #3955.

I think attrd_updater can be convinced to invent attributes with empty values if they don't exist, so you need to verify the attribute is actually deleted either via the logs or another command line (I used attrd_updater -n XYZ -A for instance).

@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Nov 14, 2025

The desired behavior here is that the attribute is missing on node1, not that it has an empty value. That's the whole point of the last patch in #3955.

Ah, I see. I don't think the end behavior was listed. You did say that nodes 1 and 3 should have the same output.

I think attrd_updater can be convinced to invent attributes with empty values if they don't exist, so you need to verify the attribute is actually deleted either via the logs or another command line (I used attrd_updater -n XYZ -A for instance).

Ahhh I didn't notice that you had used -A there. I typed rather than copy-pasting. With -A, mine behaves as desired. After the same steps with the same environment setup:

[root@fastvm-fedora42-22 ~]# pacemakerd -F
Pacemaker 3.0.1-1.c561a1eeb6.git.fc42 (Build: c561a1eeb6)
 Supporting v3.20.5: agent-manpages books cibsecrets corosync-ge-2 default-resource-stickiness generated-manpages lsb monotonic ncurses service systemd
[root@fastvm-fedora42-22 ~]# attrd_updater -n XYZ -A
attrd_updater: Could not query value of XYZ: attribute does not exist

[root@fastvm-fedora42-24 ~]# pacemakerd -F
Pacemaker 3.0.1-11.fc42 (Build: 18db266)
 Supporting v3.20.1: agent-manpages books cibsecrets corosync-ge-2 default-resource-stickiness default-sbd-sync generated-manpages lsb monotonic ncurses service systemd
[root@fastvm-fedora42-24 ~]# attrd_updater -n XYZ -A
attrd_updater: Could not query value of XYZ: attribute does not exist

@clumens
Copy link
Contributor

clumens commented Nov 14, 2025

I don't know how to test T138.

As I recall, it's pretty sporadic given that it's a race condition between daemons. See if Marketa has a more reliable reproducer or at least better luck, and if she has time to test a scratch build.

@nrwahl2
Copy link
Contributor Author

nrwahl2 commented Nov 14, 2025

I don't know how to test T138.

As I recall, it's pretty sporadic given that it's a race condition between daemons. See if Marketa has a more reliable reproducer or at least better luck, and if she has time to test a scratch build.

I'll ask. Does this mean you also have not tested that this fixes T138, and were just going to hope for the best and let QE test it?

The relevant Jira said there was something like a 50% chance of it occurring. I tested about 10 times and couldn't trigger it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants