Thursday, November 24, 2016

How you do things matters (PowerShell Performance)

I've run into a couple of things over the last month or so that were interesting from a PowerShell perspective. With relatively small amounts of data, performance really isn't a concern. If you're only working with a few hundred or few thousand items PowerShell is pretty quick no matter what. However, when you work with larger datasets, how you do things can make a big difference.

Reducing the number of operations

The first thing I want to highlight is that that the number of operations affects performance. I was recently building some scripts to build a test environment by copying the AD structure from production. I based all of my work on AD Mirror (https://gallery.technet.microsoft.com/scriptcenter/AD-Mirror-PowerShell-Module-331b1b12).

I ran into a performance issue populating group membership. The code from AD Mirror had the list of group members in a variable and added them by using a foreach loop like this:
foreach ($m in $newMembers) {
     Add-ADGroupMember -Identity $targetgroup -Members $m
}
This worked, but for groups with a large membership it was very slow. A group with 10,000 members took about 10 minutes to complete.

After doing some reading, I realized that the -Members parameter will accept an array. So, I changed it up to look like this:
Add-ADGroupMember -Identity $targetgroup -Members $newMembers
It still wasn't fast, but it was a couple of minutes instead of 10. It's still slow, but when you multiply that out across the 18,000 groups I needed to get setup, it added up to a lot of saved time.

As a side note, this is still slow because there is a lot to update in AD when the change is made. When a group member is added, both the group and user accounts need to be updated. Adding the members as a single lump means only a single change to the group, but each user account is still updated. For a group with 10,000 members, this means the loop does 20,000 updates while adding as a single set does 10,001 updates.

However, I did run into a problem with the largest groups. There must be some sort of limit on the data that can be passed or a timeout, because I had two groups with 27,000 and 20,000 users that didn't work by using the full array. Initally I went back to doing those with a loop. However, then I thought that a better way would be splitting the array into smaller sets and do them as several groups.

In my case, the sizing parameters are pretty well defined. So, I'm able to break the array up based on defined size limits. I'm sure there is a way to more gracefully break this up using variables instead of hard coded limits, but this met my needs for the moment.

This is my code that breaks up the array into chunks of 10000 accounts:
$newMembersCount=$newMembers.count
$part1 = $newMembers[0..9999]
Add-ADGroupMember -Identity $TargetGroupDn -Members $part1
   
Switch ($newMembersCount) {
   { $_ -gt 10000 } {
       Write-Host "Second 10K users"
       $part2 = $newMembers[10000..19999]
       Add-ADGroupMember -Identity $TargetGroupDn -Members $part2
       }
   { $_ -gt 20000 } {
       Write-Host "Third 10K users"
       $part3 = $newMembers[20000..29999]
       Add-ADGroupMember -Identity $TargetGroupDn -Members $part3
       }       
   { $_ -gt 30000 } {
       Write-Host "Fourth 10K users"
       $part4 = $newMembers[30000..39999]
       Add-ADGroupMember -Identity $TargetGroupDn -Members $part4
       }
} #End switch

Lists instead of arrays

When you want an array of objects in PowerShell, you typically start it by declaring an array like this:
$arrayvar = @()
Then you add stuff to it like this:
$arrayvar += $newitem
When you create and manage an array this way it ends up being very inefficient for large arrays. The array you create is a fixed size. Then when you use += to add an item, PowerShell creates a new array that contains the existing values and the new value. In a very large array that ends up being a lot of data movement with no benefit.

The fix for this is to create an array as a list that is not a fixed size:
$listvar = New-Object System.Collections.ArrayList
Then add items to the list like this:
$list.add($newitem)
The add function for a list is many times faster than using += for a fixed size array. When you are working with large lists it becomes exponentially faster. It's hard to quantify because you'll see varying performance differences based on the objects you're working with. However, this link has one example:
Here is some data from that link:
  • 10,000 items, array performance: 16 seconds
  • 10,000 items, list performance: 1 second
  • 100,000 items, array performance: 3230 seconds (53 minutes)
  • 100,000 items, list performance: 7 seconds
Update Apr 2017
The original version of this post suggested to use generic list instead of an array list. Someone with much more technical knowledge on this than me has suggested that array list is a better way to go. So, I've change the syntax above to use array list instead of generic list.

1 comment:

  1. As an alternative to predefining the arrayList with New-Object, you can also do: [System.Collections.ArrayList]$mylist = "Item1","Item2"

    ReplyDelete