Monitoring Failover Cluster Patching

Monitoring Failover Cluster Patching

Windows Failover Clustering is a great way to run services like DHCP, Hyper-V, etc. resiliently. Microsoft makes it easy to schedule patching using Cluster Aware Updating. Once you have Cluster Aware Updating setup, you'll need to monitor it.

First let's get the results of the last run:

$ Get-CauReport -Last

ClusterName			: hv-clstr
Status				: Succeeded
StartTimestamp			: 12/11/2020 3:00:00 AM
CountOfSucceededResults		: 6
CountOfFailedResults		: 0
CountOfCanceledResults		: 0
HadTransientInstallError	: False

If the Status field is Succeeded, great, your last run succeeded and the CountOfSucceededResults field shows that 6 updates were installed (in my case, 3 on each node of the cluster).

If there is a failure, add the -Detailed flag to get more results about the failure:

$ Get-CauReport -Last -Detailed

ClusterName					: hv-clstr
ClusterResult.Status				: Failed
ClusterResult.RunDuration			: 4:35:00
ClusterResult.NodeResults			: {...}
ClusterResult.ErrorRecordData		: MaxFailedNodes limit (0) exceeded.

If ClusterResult.Status is Failed, then we can look at ClusteResult.ErrorRecordData for the failure reason and ClusterResult.NodeResults for the results on individual nodes.

Here's the script to monitor the results of the most recent Cluster Aware Updating run and create alerts in Atera for failed nodes:

$Report = Get-CauReport -Last -Detailed

$UpdateSuccessful = $false
$FailureReason = ""
$FailedNodes = @()

if ($null -eq (Get-Module -ListAvailable PSAtera)) {
  Install-Module PSAtera -Force

function New-RMMAlert($Data, $RunId) {
  Set-AteraAPIKey -APIKey ""

  $CurrentAlerts = Get-AteraAlertsFiltered -Open
  $Data | Format-Table
  foreach ($node in $Data) {
    $Agent = Get-AteraAgent -MachineName $node.Node
    if ($CurrentAlerts | Where-Object { $_.DeviceGuid -eq $Agent.DeviceGuid -and $_.AdditionalInfo -eq $RunId }) { continue }
    Write-Host "Creating alert"
    New-AteraAlert -DeviceGuid $Agent.DeviceGuid -CustomerID $Agent.CustomerId -Title "Cluster Updates Failed" -Severity Critical -AlertCategoryID General `
      -MessageTemplate $Node.ErrorRecordData.ExceptionData.Message -AdditionalInfo $RunId

if ($Report.ClusterResult.Status -eq "Succeeded") {
	$UpdateSuccessful = $true
} else {
	$FailureReason = $Report.ClusterResult.ErrorRecordData
  $FailedNodes = $Report.ClusterResult.NodeResults | Where-Object Status -ne "Succeeded" | Select-Object Node, Status, ErrorRecordData
  New-RMMAlert -Data $FailedNodes -RunId $Report.ClusterResult.RunId