Thursday, June 12, 2014

Replace Missing Cluster Name Object for DAG

Ran into a strange issue today with an Exchange 2013 DAG. The cluster name object (CNO) for the DAG had been deleted at some point. It must have been a long time ago because I couldn't find a tombstone objects for it to try and bring it back. What was amazing is that the DAG functioned fine except that the File Share Witness (FSW) was offline because the CNO is used to access the shared folder for the FSW.

When you create a DAG, a computer object is created that represents the cluster name. This is the CNO.

A quick search revealed a number of documents talking about how to recover the DAG object from Deleted Items. However, this was not possible for me. Instead, I had to recreate the CNO.

Before we go any further, let me say that the smart thing to do is probably break the DAG and recreate it in this scenario. If you have up to date copies of the data in the remote location, then adding the replica back after you recreate DAG should go quickly. However, I figured I'd give the repair a try. Let's just say this is not exactly MS approved.

First I searched and searched to make sure that the CNO was really not there. The CNO has the same name as the DAG and mine was not there.

Here is the process I followed:
  1. Create a new computer account for the CNO named the same as the DAG.
  2. Give the DAG nodes (computer accounts) full control permission to the CNO.
  3. Identify the objectGUID attribute for the CNO and make note of it. This will be added to the cluster configuration.
  4. On one of the DAG nodes, use Regedit and view HKLM\Cluster and identify the value of ClusterNameResource. It will be a long ugly number and is the Cluster Name Resource GUID
  5. In Regedit, browse to HKLM\Cluster\Resources\ClusterNameResourceGUID\Parameters.
  6. Edit the ObjectGUID key and replace the value with the value you copied in step 3. This tells the cluster to use the new CNO.
  7. Perform steps 5 and 6 on all DAG nodes.
  8. Restart the Cluster service on all DAG nodes.
  9. Update share permissions on the FSW shared folder to give the CNO full control.
  10. Update ntfs permissions on the FSW folder to give the CNO modify.
NOTE: In step 3, you cannot use the GUID value as shown in the Attribute Editor tab in AD Users and Computers. You need to view the objectGUID attribute and copy the Hexadecimal value. This is a reordered version of the value visible in the Attribute Editor tab.

At this point, I was still getting errors about the File Share Witness not being able to start because of failed logon. The event log had the following error:
Event ID: 1228
Cluster network name resource 'Cluster Name' encountered an error enabling the network name on this node. The reason for the failure was: 'Unable to obtain a logon token'.
The error code was '1326'.

The final step I needed to perform was:
  1. In Failover Cluster Manager, take the cluster name offline.
  2. Right-click the cluster name, point to More Actions, and click Repair.
The repair seemed to sync up the computer account name and computer account password with the failover cluster. I'm not sure this is entirely accurate, but at this point the cluster name started with no errors in the event log and the FSW resource started properly.

Some of the errors I received during troubleshooting were:
Event ID: 1069
Cluster resource 'File Share Witness (\\fsw.domain.com\dag.domain.com)' of type 'File Share Witness' in clustered role 'Cluster Group' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Event ID: 1196
Cluster network name resource 'Cluster Name' failed registration of one or more associated DNS name(s) for the following reason: The handle is invalid.
Ensure that the network adapters associated with dependent IP address resources are configured with at least one accessible DNS server.

Event ID: 1207
The computer object associated with the cluster network name resource 'Cluster Name' could not be updated in domain 'domain.com' during the
Resource post online operation.
The text for the associated error code is: There is no such object on the server.
The cluster identity 'CNO$' may lack permissions required to update the object. Please work with your domain administrator to ensure that the cluster identity can update computer objects in the domain.

Some resources that were useful for troubleshooting:

No comments:

Post a Comment