-
Notifications
You must be signed in to change notification settings - Fork 349
Erase transient attributes in attribute manager instead of controller #3991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Added controller commits from #3955 |
2e132cc to
3a10d98
Compare
|
|
This PR at least seems to fix the issue in T139. When the controller dies and respawns, the attributes are kept in both the CIB and the attribute manager. T138 looks harder to test, and in fact I don't know how to. T137 does not track a bug, but rather the removal of code that's made redundant by the fix for T138/T139. This PR resolves T137 via the controller commits, but we still need to test whether everything works as expected during rolling upgrades. |
3a10d98 to
f3a5cb4
Compare
|
Fixed yet another NULL hash table issue :/ |
I don't know how to test the scenarios described in the task description. However, the linked issue RHEL-23082 says the cluster shutdown hangs when deleting the fence devices and stopping the cluster. I have been unable to reproduce this issue with the newly built package. BUT, I have also been unable to reproduce that issue on main. |
|
I performed a rolling upgrade test from the shipped pacemaker packages to the package from this PR, following the steps in #3955 (comment)). My nodes are:
At the end of the test
|
f3a5cb4 to
f77a2b7
Compare
Previously, when the attribute manager purged a node, it would purge the node's transient attributes only from memory, and assumed the controller would purge them from the CIB. Now, the writer will purge them from the CIB as well. This fixes a variety of timing issues when multiple nodes including the attribute writer are shutting down. If the writer leaves before some other node, the DC wipes that other node's attributes from the CIB when that other node leaves the controller process group (or all other nodes do if the DC is the leaving node). If a new writer (possibly even the node itself) is elected before the node's attribute manager leaves the cluster layer, it will write the attributes back to the CIB. Once the other node leaves the cluster layer, all attribute managers remove its attributes from memory, but they are now "stuck" in the CIB. As of this commit, the controller still erases the attributes from the CIB when the node leaves the controller process group, which is redundant but doesn't cause any new problems. This will be corrected in an upcoming commit. Note: This will cause an insignificant regression if backported to Pacemaker 2. The Pacemaker 2 controller purges attributes from the CIB for leaving DCs only if they are at version 1.1.13 or later, because earlier DCs will otherwise get fenced after a clean shutdown. Since the attribute manager doesn't know the DC or its version, the attributes would now always be wiped, so old leaving DCs will get fenced. The fencing would occur only in the highly unlikely situation of a rolling upgrade from Pacemaker 2-supported versions 1.1.11 or 1.1.12, and the upgrade would still succeed without any negative impact on resources. Fixes T138 Co-Authored-By: Ken Gaillot <kgaillot@redhat.com> Co-Authored-By: Chris Lumens <clumens@redhat.com> Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
The requesting_shutdown variable was checked only by attrd_shutting_down(), when the if_requested argument was set to true. In that case, it returned true if either the shutting_down variable was true or both the if_requested argument and the requesting_shutdown variable were true. The only caller that passed if_requested=true was attrd_cib_updated_cb(). It did this if: a. the alerts section was changed, or b. the status section or nodes section was changed by an untrusted client. Details: a. Prior to f42e170, we didn't pass if_requested=true for an alerts section change. We started doing so as of that commit mostly for convenience. We decided that it seemed reasonable to ignore alert changes when there was a shutdown pending. This commit reverts to NOT ignoring alert changes due to pending shutdown. That seems like it might be better. I'm not sure if it's possible for us to land in attrd_send_attribute_alert() while a shutdown is requested but has not begun. If so, it would be good to send the correct alerts. b. The other call with true is to avoid writing out all attributes when the status or nodes section changes. It's probably okay to drop the true there too. It was added by a1a9c54, to resolve a race condition where: * node2 left. * node1's controller deleted node2's transient attributes from the CIB. * node1 took over as DC and replaced the CIB. * node2's attribute manager was not yet actually shutting down, and it responded to the CIB replacement by writing out all of the attributes that were in its memory, including its own "shutdown" attribute. Now (as of the previous commit), node1's attribute manager would delete this "shutdown" attribute as part of its shutdown process. (Or more accurately, I think the attribute writer node will do that.) So if we understand correctly, the attrd_shutting_down(true) workaround is no longer needed. With no more callers needing to pass true, the supporting code can go away. Co-Authored-By: Reid Wahl <nrwahl@protonmail.com>
Now that the attribute manager will erase transient attributes from the CIB when purging a node, we don't need to do that separately in the controller. Co-Authored-By: Chris Lumens <clumens@redhat.com>
Nothing uses the new capability yet.
With recent changes, the attribute manager now handles it when the node leaves the cluster, so the controller purge is redundant. This does alter the timing somewhat, since the controller's purge occurred when the node left the controller process group, while the attribute manager's purge occurs when it leaves the cluster, but that shouldn't make a significant difference. This fixes a problem when a node's controller crashes and is respawned while fencing is disabled. Previously, another node's controller would remove that node's transient attributes from the CIB, but they would remain in the attribute managers' memory. Now, the attributes are correctly retained in the CIB in this situation. Fixes T137 Fixes T139 Co-Authored-By: Chris Lumens <clumens@redhat.com>
...instead of wiping from the CIB directly. Co-Authored-By: Chris Lumens <clumens@redhat.com>
It now boils down to a bool for whether we want only unlocked resources.
...to controld_delete_node_history(), and controld_node_state_deletion_strings() to controld_node_history_deletion_strings(), since they delete only history now.
This has only ever had two values, which basically just means it's a bool.
f77a2b7 to
c561a1e
Compare
|
@clumens This is ready for review as far as I'm concerned. If it works, then I'm pretty happy with it. As I said in previous comments, T137 and T139 appear to be addressed, and rolling upgrades appear to work as expected. I don't know how to test T138. I'd love to get your input on that, and to have you test it as well to feel more confident. |
|
A 50-iteration |
The desired behavior here is that the attribute is missing on node1, not that it has an empty value. That's the whole point of the last patch in #3955. I think attrd_updater can be convinced to invent attributes with empty values if they don't exist, so you need to verify the attribute is actually deleted either via the logs or another command line (I used |
Ah, I see. I don't think the end behavior was listed. You did say that nodes 1 and 3 should have the same output.
Ahhh I didn't notice that you had used |
As I recall, it's pretty sporadic given that it's a race condition between daemons. See if Marketa has a more reliable reproducer or at least better luck, and if she has time to test a scratch build. |
I'll ask. Does this mean you also have not tested that this fixes T138, and were just going to hope for the best and let QE test it? The relevant Jira said there was something like a 50% chance of it occurring. I tested about 10 times and couldn't trigger it. |
I'm opening this as an alternative to the "Fix: pacemaker-attrd: wipe CIB along with memory" commit in #3955. I have not done any testing. It's already late at night and I just finished this prototype. I don't have the time and energy to look into the testing steps and such.
This is so simple that I feel like we must have already tried something like this, and there must be a reason why it can't work. But I have to put it out there for discussion. I would love to find a less confusing approach than the
NULLtrick we're using in #3955.If this works (a huge "if"), then it might even take care of the rolling upgrade issue, since this approach doesn't depend on a CIB update notification to tell us that it's safe to erase values from
attrdmemory.We should no longer need the
attrd_for_cib()commit. However, we would still need the controller commits from #3955, which I did not include in this PR.I didn't add a commit message beyond the one-line header, and there are a couple more places where we could benefit from comments/Doxygen. I don't want to invest too much effort since this approach might fail miserably.