Sunday, May 13, 2012

Recovering a Failed RAID 5 Array on a PERC Card

One of our clients had the misfortune to have a RAID 5 array get corrupted last week. When you went into the configuration of the RAID card and looked at the virtual disk, 1 disk in the array was shown as missing, 1 online, and 1 offline. The disk that was missing was actually marked as foreign, which means that it is not recognized as part of the virtual disk.

To make it even more fun, the RAID 5 array was also the boot disk.

Step 1 for recovery, force the offline disk to online. With three disks in a RAID 5 array, two online should have all of the data and be OK. However, it was not to be. The drive was still not accessible.
I added a new disk and made a parallel install of Windows so I could see what was going on. The partition that should have had all the data was corrupted in some way with the partition type showing as RAW instead of NTFS.

I tried importing the foreign disk, but no such luck. Which makes sense as the RAID card thinks it is part of another virtual disk.

I did all of the data recovery stuff I could with software that scans the RAW data and attempts to recover files. All the while not modifying the corrupted volume. There was still one last hope.

My last hope was to recreate the virtual disk with the following steps:
  1. Document the configuration of the virtual disk (stripe size, etc) because we need it later.
  2. Delete the virtual disk. This does not destroy the data on the disk, it just removes the virtual disk from the PERC card configuration.
  3. Clear the foreign status of the 3rd disk. Again, this does not delete data, it just deletes RAID configuration information.
  4. Recreate the Virtual disk with the 3 drives, DO NOT INITIALIZE the disks, and use the settings recorded in Step 1. Initializing the disks would delete the data on the disks.
  5. Restart the server and chkdsk performs major repairs to the RAID volume (whoo hoo, it sees NTFS now)
After the repair was complete, the system would boot, but Active Directory was corrupted. Based on that I wasn't sure what else would be corrupted. So, I took a copy of all data that might have changed since their most recent backup (that I'd already verified as good). Then I reinitialized the RAID 5 array to wipe the data, reinstalled, and did a full restore from backup.

The data from the repaired disk seems to be suspect. So, I don't think we'll have anything useful from it. For example, the Exchange databases were not able to be mounted.

My frustration was when looking at the RAID configuration that the documentation was very vague about what my options were. Hopefully the description of recreating the RAID volume gives someone else a little more confidence if he/she is going through the same process.

2 comments:

  1. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. Its very dangerous to force a drive online.
      When a raid5 fails it often appears that 2 drives have failed. Most likely what actually happened is this:
      1) 1 drive fails.
      2) The raid starts operating in degraded mode running on 2 out of 3 drives
      3) sometime later a second drive fails.
      The raid now stops and tech-support is
      brought in.

      Now the situation is that you have 2 drives offline. 1 contains stale data and 1 is current.
      When you force the drives online then you create a volume with stale data. It will usually look pretty good - but recent files will be bad. Small files that live entirely on the 'good' stripe will also be good. Large recent files will be bad. By small files I mean less than the stripe size - so 64kb is typical.

      This is most likely what you are seeing.

      Take Care
      Wayne Horner
      Alandata Data Recovery
      alandata.com
      alandatarecovery.com

      Delete