Skip to content

Commit c9ce1b5

Browse files
authored
AB#6889: Create cluster-aware-updating.md (#10055)
* Create cluster-aware-updating.md * Update cluster-aware-updating.md * Update cluster-aware-updating.md * Update cluster-aware-updating.md
1 parent 495056d commit c9ce1b5

File tree

1 file changed

+192
-0
lines changed

1 file changed

+192
-0
lines changed
Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
---
2+
title: Cluster-Aware Updating (CAU) Troubleshooting Guide
3+
description: Resolves issues that affect Cluster-Aware Updating (CAU) in Windows Failover Clusters.
4+
ms.date: 10/06/2025
5+
author: kaushika-msft
6+
ms.author: kaushika
7+
manager: dcscontentpm
8+
audience: itpro
9+
ms.topic: troubleshooting
10+
ms.reviewer: kaushika
11+
ms.custom:
12+
- sap: clustering and high availability\cluster-aware updating (CAU)
13+
- pcy: High availability\Cluster-Aware Updating (CAU)
14+
appliesto:
15+
- <a href=https://learn.microsoft.com/windows/release-health/windows-server-release-info target=_blank>Supported versions of Windows Server</a>
16+
---
17+
18+
# Cluster-aware updating (CAU) troubleshooting guide
19+
20+
## Summary
21+
22+
Cluster-Aware Updating (CAU) is a powerful feature designed to streamline updating for nodes in Windows Failover Clusters with minimal disruption to workloads. While CAU enhances high availability and maintenance efficiency, various operational and configuration issues can arise—ranging from updating failures and cluster service interruptions to permission errors and resource outages. This guide provides a thorough, field-tested approach for diagnosing and resolving the most common CAU and cluster update failures, ensuring your clustered workloads remain healthy and up-to-date.
23+
24+
## Troubleshooting checklist
25+
26+
Use this checklist for systematic troubleshooting:
27+
28+
- Review Recent Changes:
29+
- Was there an OS version upgrade, driver or firmware update, or Active Directory change?
30+
- Are there new nodes, network configurations, or changes to updating schedules?
31+
- Verify cluster and CAU Setup:
32+
- Is the CAU clustered role installed and running?
33+
- Is each node healthy and able to communicate with others?
34+
- Are cluster networks and storage in a healthy state?
35+
- Check update source and Management Tools:
36+
- Are all nodes configured to use the same update source?
37+
- Are SCCM or other management tools compatible with CAU in your environment?
38+
- Verify Active Directory permissions:
39+
- Does the Cluster Name Object (CNO) have full control to create and manage Virtual Computer Objects (VCOs)?
40+
- Are there any AD replication or delegation issues?
41+
- Monitor for service failures and crash dump files:
42+
- Are nodes or virtual machines (VMs) restarting unexpectedly?
43+
- Are there crash dump files or event log errors that point to driver, storage, or network issues?
44+
- Collect logs and error messages:
45+
- Gather cluster logs, update logs, and system/application event logs.
46+
- Note any specific error messages or event IDs.
47+
48+
## Common issues and solutions
49+
50+
### 1. CAU role missing or failed
51+
52+
#### Symptoms
53+
54+
- CAU role cannot be managed or checked for status.
55+
- Errors: “The cluster resource could not be found,” WU_E_PT_ENDPOINT_DISCONNECTED, CAUUpdatePlugin failures.
56+
57+
#### Resolution
58+
59+
- Verify CAU role in Failover Cluster Manager or through PowerShell (Get-ClusterResource).
60+
- Remove any conflicting or misnamed cluster resources (Remove-ClusterRole -ClusterName \<ClusterName> -force).
61+
- Re-create and pre-stage the CAU computer object in Active Directory.
62+
- Re-add the CAU role (Add-CauClusterRole ...), ensuring correct plugin, account, and scheduling.
63+
- Confirm cluster validation passes with Test-Cluster.
64+
65+
### 2. Permission or Active Directory Issues
66+
67+
#### Symptoms
68+
69+
- Update plugin reports “Access is denied.”
70+
- Event IDs: 1194, 1069. CNO can't create VCO. Resources don't come online after update.
71+
72+
#### Resolution
73+
74+
- In Active Directory Users and Computers, verify that the CNO has full control and "Create Computer Object" privilege on the appropriate OU.
75+
- If the cluster resource fails, reset or repair it: Right-click to repair or reset the password as necessary.
76+
- Reapply permissions and verify functionality by using a CAU test run.
77+
- Make sure that cluster network name resource is enabled and present in AD.
78+
79+
### 3. Cluster nodes not using same update source
80+
81+
#### Symptoms
82+
83+
- Some nodes don't receive updates during update cycles.
84+
- Error: “Cluster-aware updating failed on one node,” consistent 0x80072ee2 in update logs.
85+
86+
#### Resolution
87+
88+
- Compare registry settings for update source (for example, WSUS server) on all nodes (reg query).
89+
- Export working node’s update-related registry. Import to affected nodes.
90+
- Check group policy for settings that might revert update sources.
91+
- Validate consistency post-restart, and make sure success of future CAU runs.
92+
93+
### 4. Network or storage failures affecting updates
94+
95+
#### Symptoms
96+
97+
- VMs restart instead of migrating.
98+
- Event IDs: 158, 58, 155 (storage/fs), “Cluster network is down,” device removals.
99+
- Cluster shared volumes enter a paused state, loss of storage access.
100+
101+
#### Resolution
102+
103+
- Review physical network and storage connections/cables on affected nodes.
104+
- Verify cluster network health (Get-ClusterNetwork), address adapter or VLAN issues.
105+
- Use Test-Cluster to identify hardware health problems.
106+
- Engage storage/network teams for persistent device or connectivity errors.
107+
- Restore storage, correct network issues, retry updating.
108+
109+
### 5. Incorrect update sequencing or forced node failover
110+
111+
#### Symptoms
112+
113+
- Updates install before draining cluster roles, causing workload outages.
114+
- Updating proceeds after multiple failed drain attempts (forced restart/failover of VMs).
115+
116+
#### Resolution
117+
118+
- Always use the -ForcePauseAndDrain flag when scheduling update scripts or CAU runs.
119+
- Review update policies to make sure that node drain and maintenance mode precede updating.
120+
- Don't use custom pre/post-update resource move scripts that might cause a corrupted state.
121+
- Increase concurrent VM migration limits, if it's necessary (Set-VMHost -MaximumVirtualMachineMigrations).
122+
123+
### 6. Driver-related cluster or node failures
124+
125+
#### Symptoms
126+
127+
- Host node restarts unexpectedly. Many VMs restart.
128+
- Bugchecks ("a"), random memory corruption, kernel pages filled with nulls.
129+
- WinDbg reports memory corruption involving graphics or storage adapter drivers.
130+
131+
#### Resolution
132+
133+
- Identify and update outdated drivers (for example, GRID/Nvidia, storage HBA).
134+
- Install the manufacturer-recommended driver versions (for example, Nvidia GRID = 573.48).
135+
- Restart nodes, monitor for further crashes.
136+
- Analyze crash dump files by using WinDbg for confirmation.
137+
138+
### 7. Stale or orphaned cluster roles not removed
139+
140+
#### Symptoms
141+
142+
- Cluster resources remain after removal attempts.
143+
- PowerShell returns: “WARNING: The current cluster isn't configured with a Cluster-Aware Updating clustered role.”
144+
145+
#### Resolution
146+
147+
- Use PowerShell (Remove-ClusterRole... -force) to remove persistent roles.
148+
- Try to move core cluster group to another node before retrying removal.
149+
- If all else fails, review cluster database hive and clear residual entries manually.
150+
151+
### 8. Management tools and update source integration issues
152+
153+
#### Symptoms
154+
155+
- Updates scheduled in SCCM don't reflect or apply to cluster nodes.
156+
- Integration attempts by using third-party update management fail.
157+
158+
#### Resolution
159+
160+
- Acknowledge that SCCM isn't cluster-aware and doesn't natively integrate with CAU.
161+
- Run cluster-aware updating and SCCM updating as separate processes.
162+
- Review documentation for any available integration or escalate for custom solution/support.
163+
164+
## Data collection
165+
166+
- Cluster Logs: Get-ClusterLog -UseLocalTime
167+
- Event Logs: Export through Event Viewer: System, Application, FailoverClustering, Hyper-V-High-Availability
168+
- Update Logs: Get-WindowsUpdateLog
169+
- Diagnostic Data Packages (SDP): .\SDP_Cluster.exe -SDP -SkipSDPList -acceptlogs
170+
- Network Trace: netsh trace start capture=yes
171+
- Registry settings: reg export "HKLM\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate" C:\WU-Reg-Backup.reg
172+
- Driver information and dump files: System crash dump files, analyzed by using WinDbg or similar.
173+
174+
## Common issues quick reference table
175+
176+
| Symptom/error | Root cause | Resolution |
177+
| --- | --- | --- |
178+
| CAU role missing or error “resource could not be found” | Role missing or misconfigured | Remove conflicting role, recreate in AD, re-add CAU role |
179+
| Update plugin “Access is denied”, VCO errors | AD permission/ delegation issues | Grant CNO/VCO full control, reset permissions, repair resource |
180+
| Update fails on a node, 0x80072ee2 codes, split update sources | GPO/Registry inconsistency | Align registry settings, fix GPO, make sure all nodes use same source |
181+
| VMs restart instead of migrate, storage/network fails, Event IDs 158/58/155 | Storage/network/driver failure | Check cables, validate cluster health, update drivers, review logs |
182+
| Updates run before draining, unexpected role failover | update algorithm change, missing flag | Use -ForcePauseAndDrain flag, adjust migration limits |
183+
| “Cluster resource remains after remove”, removal warnings | Stale/orphaned resource | Use Remove-ClusterRole... -force, move core group, clear in hive |
184+
| SCCM or third-party update tool cannot update cluster nodes | No native tool integration | Use CAU separately, escalate for custom/unsupported integrations |
185+
| Host restarts, VM mass restart, memory corruption in dumps | Outdated driver (for example, Nvidia GRID) | Update driver to fixed version. Check for hardware defects |
186+
187+
## References
188+
189+
- [Cluster-Aware Updating Documentation](/windows-server/failover-clustering/cluster-aware-updating)
190+
- [Best Practices for Hyper-V Cluster updating](/system-center/vmm/hyper-v-update)
191+
- [Windows Update Log Analysis](/windows/deployment/update/windows-update-logs)
192+
- [Diagnosing Memory Dumps with WinDbg](/windows-hardware/drivers/debugger/)

0 commit comments

Comments
 (0)