Thursday, November 24, 2016

How you do things matters (PowerShell Performance)

I've run into a couple of things over the last month or so that were interesting from a PowerShell perspective. With relatively small amounts of data, performance really isn't a concern. If you're only working with a few hundred or few thousand items PowerShell is pretty quick no matter what. However, when you work with larger datasets, how you do things can make a big difference.

Reducing the number of operations

The first thing I want to highlight is that that the number of operations affects performance. I was recently building some scripts to build a test environment by copying the AD structure from production. I based all of my work on AD Mirror (https://gallery.technet.microsoft.com/scriptcenter/AD-Mirror-PowerShell-Module-331b1b12).

I ran into a performance issue populating group membership. The code from AD Mirror had the list of group members in a variable and added them by using a foreach loop like this:
foreach ($m in $newMembers) {
     Add-ADGroupMember -Identity $targetgroup -Members $m
}
This worked, but for groups with a large membership it was very slow. A group with 10,000 members took about 10 minutes to complete.

After doing some reading, I realized that the -Members parameter will accept an array. So, I changed it up to look like this:
Add-ADGroupMember -Identity $targetgroup -Members $newMembers
It still wasn't fast, but it was a couple of minutes instead of 10. It's still slow, but when you multiply that out across the 18,000 groups I needed to get setup, it added up to a lot of saved time.

As a side note, this is still slow because there is a lot to update in AD when the change is made. When a group member is added, both the group and user accounts need to be updated. Adding the members as a single lump means only a single change to the group, but each user account is still updated. For a group with 10,000 members, this means the loop does 20,000 updates while adding as a single set does 10,001 updates.

However, I did run into a problem with the largest groups. There must be some sort of limit on the data that can be passed or a timeout, because I had two groups with 27,000 and 20,000 users that didn't work by using the full array. Initally I went back to doing those with a loop. However, then I thought that a better way would be splitting the array into smaller sets and do them as several groups.

In my case, the sizing parameters are pretty well defined. So, I'm able to break the array up based on defined size limits. I'm sure there is a way to more gracefully break this up using variables instead of hard coded limits, but this met my needs for the moment.

This is my code that breaks up the array into chunks of 10000 accounts:
$newMembersCount=$newMembers.count
$part1 = $newMembers[0..9999]
Add-ADGroupMember -Identity $TargetGroupDn -Members $part1
   
Switch ($newMembersCount) {
   { $_ -gt 10000 } {
       Write-Host "Second 10K users"
       $part2 = $newMembers[10000..19999]
       Add-ADGroupMember -Identity $TargetGroupDn -Members $part2
       }
   { $_ -gt 20000 } {
       Write-Host "Third 10K users"
       $part3 = $newMembers[20000..29999]
       Add-ADGroupMember -Identity $TargetGroupDn -Members $part3
       }       
   { $_ -gt 30000 } {
       Write-Host "Fourth 10K users"
       $part4 = $newMembers[30000..39999]
       Add-ADGroupMember -Identity $TargetGroupDn -Members $part4
       }
} #End switch

Lists instead of arrays

When you want an array of objects in PowerShell, you typically start it by declaring an array like this:
$arrayvar = @()
Then you add stuff to it like this:
$arrayvar += $newitem
When you create and manage an array this way it ends up being very inefficient for large arrays. The array you create is a fixed size. Then when you use += to add an item, PowerShell creates a new array that contains the existing values and the new value. In a very large array that ends up being a lot of data movement with no benefit.

The fix for this is to create an array as a list that is not a fixed size:
$listvar = New-Object System.Collections.ArrayList
Then add items to the list like this:
$list.add($newitem)
The add function for a list is many times faster than using += for a fixed size array. When you are working with large lists it becomes exponentially faster. It's hard to quantify because you'll see varying performance differences based on the objects you're working with. However, this link has one example:
Here is some data from that link:
  • 10,000 items, array performance: 16 seconds
  • 10,000 items, list performance: 1 second
  • 100,000 items, array performance: 3230 seconds (53 minutes)
  • 100,000 items, list performance: 7 seconds
Update Apr 2017
The original version of this post suggested to use generic list instead of an array list. Someone with much more technical knowledge on this than me has suggested that array list is a better way to go. So, I've change the syntax above to use array list instead of generic list.

Wednesday, November 2, 2016

SCOM AD Monitoring Alerts

I've been working with a larger client for the last several months on Active Directory (AD) issues. One of the ongoing small issues has been AD monitoring alerts generated in System Center Operations Manager (SCOM) when it appears nothing is actually wrong.

The alerts were appearing intermittently on several of the servers, but not all. We were seeing alerts like this:

Failed to ping or bind to the Infrastructure Master FSMO role holder
AD Op Master Response : The script 'AD Op Master Response' could not determine the schema Op Master.The error returned was: 'LDAP://DC.contoso.com/RootDSE' (0x8007203A)

Failed to ping or bind to the Schema Master FSMO role holder
AD Op Master Response : The script 'AD Op Master Response' could not determine the schema Op Master.The error returned was: 'LDAP://DC.contoso.com/RootDSE' (0x8007203A)

Failed to ping or bind to the RID Master FSMO role holder
AD Op Master Response : The script 'AD Op Master Response' could not determine the RID Op Master.The error returned was: 'LDAP://DC.contoso.com/RootDSE' (0x8007203A)

Failed to ping or bind to the PDC FSMO role holder
AD Op Master Response : The script 'AD Op Master Response' could not determine the PDC Op Master.The error returned was: 'LDAP://DC.contoso.com/RootDSE' (0x8007203A)

Failed to ping or bind to the Domain Naming Master FSMO role holder
AD Op Master Response : The script 'AD Op Master Response' could not determine the domain naming Op Master.The error returned was: 'LDAP://DC.contoso.com/RootDSE' (0x8007203A)

Script Based Test Failed to Complete
AD General Response : While running 'AD General Response' the following consecutive errors were encountered: Failed to bind to 'LDAP://DC.contoso.com/rootDSE'. This is an unexpected error. The error returned was 'LDAP://DC.contoso.com/rootDSE' (0x8007203A)
We also implemented AD Client monitoring and it showed the following errors:
Script Based Test Failed to Complete
AD Client Connectivity : The script 'AD Client Connectivity' failed while getting 'LDAP://DC.contoso.com/RootDSE.The error returned was: 'LDAP://DC.contoso.com/RootDSE' (0x8007203A)

Script Based Test Failed to Complete
AD Client Monitoring: AD Connectivity is unavailable, or the response is too slow. The bind to 'LDAP://DC.contoso.com/RootDSE' took 4259 milliseconds, which is longer than the allowed 1000 milliseconds.
After doing a packet sniff on one of the DCs, I was able to correlate a specific pattern of network activity with the failure to connect for AD Client Connectivity. When the connection error occurred the AD Client attempted to connect to the DC, but the DC refused the connection. In Network Monitor, this appears as:
  • Client to server: TCP packet with the SYN flag set
  • Server to client: TCP packet with Ack and Reset flags set
This behavior indicates that there is either no service listening on the port or that the service is refusing the connection. The service was definitely listening because it worked most of the time. So, the service must be refusing the connection for some reason. It wasn't a timeout of any type because the refusal was happening in about 1ms.

These domain controllers were running Windows Server 2008 R2. I did find some web pages, such as the one below, that indicated this type of error can be caused by IPv6 being disabled in Windows Server 2012:
I can confirm that enabling IPv6 also fixed this issue for Windows Server 2008 R2. However, it didn't fix the problem immediately. If we had restarted the DCs, it probably would have taken effect immediately, but we enabled IPv6 during production hours and couldn't reboot right away. About 7 hours after IPv6 was enabled, all alerts related to the AD Client monitoring stopped and it's been all good since.

In our scenario, it was a little more difficult to be confident this would work because the only two servers experiencing this issue regularly were in a single location. That led us to think that there might be something odd in the network configuration there. Both of those servers were configured to use WINS for some legacy stuff in that site. I think that might have been an influencing factor somehow as that's the only configuration difference I could find.

Update:
The two servers having these errors had slow LDAP client bind times even after these errors went away. After a lot of monitoring, we added additional virtual processor to those two DCs. We increased from 2 vCPU to 4 vCPU. Once that was done, the slow LDAP client bind times disappeared. So, insufficient processing power seemed to play a part in this also. There were two DCs servicing a very large site (I think over 5000).