Saturday, August 16, 2014

Network Error When Adding DAG Member in DR Site

I was adding a new DAG member in a disaster recovery (DR) site and got an error about networking. The error was as follows:
A server-side administrative operation has failed. 'GetDagNetworkConfig' failed on the server. Error: The NetworkManager has not yet been initialized. Check the event logs to determine the cause. [Server: NewMember.domain.com] It was running the command 'Get-DatabaseAvailabilityGroupNetwork | Sort-Objects -Property Name'.
This error is due to Active Directory replication latency to the DR site. The server in the DR site does not have the Active Directory changes in the configuration partition yet. However, if you look at Failover Cluster Manager, the new server is up in the DAG cluster.

You have two options:
  1. Wait for replication to complete. This typically takes 15 minutes between sites.
  2. Trigger replication to speed it up. After I triggered replication to the remote site and then triggered replication back the error was gone right away and I could move on with adding my database copies for the DR site.

 

Preseeding Exchange 2010 Databases in DR Site

One of our clients uses a remote location as a disaster recovery (DR) site for Exchange. The purpose of the DR site is less about functionality (although, it is usable), it's more about the offsite backup functionality this provides.

Last week, the Exchange server in the DR site failed and after rebuilding it, we needed to get it going again. The link speed to the remote location is only about 5 Mbps on which they can move about 50 GB of data per day. Given that they have 250GB of mail data, seeding over the network would have resulting in about 5 days of seeding if there were no network interruptions.

The process for preseeding is clearly described in the Microsoft documentation and works as advertised:
  1. Clean up an incorrect data for the database such as database copies that no longer exist if you are in a recovery situation.
  2. Disable circular logging on the database. You are going to take a copy of the database and the log files generated between when you take a copy and when you start it on the new server are copied to bring it up to a current state.
  3. Dismount the database and copy the files (database, logfiles, and index) to portable media. You need to dismount the database to ensure that the copy you have is consistent. Alternatively, if you can't afford the outage to copy the database, you can backup your database and restore to portable media. That backup will have a consistent copy of the database. You could also briefly dismount the database, take a volume shadow copy, mount the database, and use the volume shadow copy as your source.
  4. Remount the database. When you remount the database users can begin using their mailboxes again.
  5. On the new server copy the files from portable media. Remember that the file path for a database copy needs to be exactly the same on all servers. So, put it in the exact same path.
  6. On the new server add the mailbox database copy by using the Exchange Management Shell. You need to use EMS to use the -SeedingPostponed parameter that indicates that the database is already there and not to copy over top of it. You do not need to run a command to indicate you are ready, the database starts up immediately.
  7. Wait for log files to replicate and replay. Now the database mounts locally and all log files since the copy are replicated to the new server and replay on the copied database.
  8. Finally, if you were using circular logging you can reenable it on the database.
To add the database copy in step 6 use this command:
Add-MailboxDatabaseCopy -Identity DbName -MailboxServer MbxServerName -SeedingPostponed
The Microsoft documentation is here:

Friday, August 15, 2014

Blue Screen Win32k.sys 0x50

Yesterday we had one client lose 5 machines to a stop error when they rebooted. This is caused by a bad Microsoft update. Specifically, KB2982791 which updates fonts. If you search around there were previous issues cause by font updates and the font cache not being cleared properly.

Yesterday, we fixed up a couple of the machines by doing a system restore and delaying all updates because we were not sure of the cause. I'm now seeing postings that indicating that removing the font cache file may allow the system to boot. Delete this file:
  • C:\Windows\System32\fntcache.dat
For your reference, this Microsoft forum with a post by Iaurens provides step by step instructions on how to recover by booting from a Windows 7 Install DVD.
Such a pity, we had just started doing auto approvals of updates for some clients again. I guess we'll move back to the manual approval mode so that we can delay updates like this that are likely to get pulled.

I should note that I don't think this is widespread as we've only had one client with the issue. Strangely it was their Engineering users with the issue. So, I'm guessing it's some type of software interaction causing the issue.

If you are using WSUS, I strongly suggest you decline KB2982791.

Update: Further reading suggests that KB2970228 is also bad and should be declined at this time.

Update, Aug 29/14: More drama. Apparently the replacement update for this also has some issues. Although less severe. Read someone else's rant here:

Tuesday, August 5, 2014

Remove OEMDRV Drive from Dell Server

I recently installed a Dell Server by using the Lifecycle Controller. This system uses a wizard to help with the installation of the operating system. In this case, I was installing Windows Server 2008 R2 to replace an existing Exchange 2010 server.

As part of the installation, an OEMDRV USB drive is created by the Lifecycle Controller that contains the drivers used during OS installation. OS installation went well, but I ran into an issue afterwards. The OEMDRV drive was using E:, which I needed for my Exchange data.

When you go into computer management, OEMDRV shows as a removable drive. However, you cannot change the drive letter or eject OEMDRV. By default the Lifecycle controller removes this drive after 18 hours, but I didn't want to wait that long.

To force OEMDRV to be removed earlier, restart the server and press F10 to enter the Lifecycle Controller configuration. Then exit the Lifecycle Controller and reboot again. You don't need to make any changes in the configuration. Just entering and exiting triggers the removal.