Dave's Bit Bucket

Dave Walker's jottings - mostly pertaining to security


20070122 Monday January 22, 2007

Thoughts on Disk Scrubbing

The range of views on disk scrubbing - "scrubbing" being the writing of various patterns of data to disks in an attempt to remove all traces of existing data from them before disposal or re-purposing - is as wide as the range of our customers. Some folk are happy with the "purge" mode of the format(1M) command (where available), others write sets of 1s followed by sets of 0s (or vice-versa), still others write repeated fixed bit strings, and others write data from a random number generator. Of course, there are others who do varying combinations of all of the above - 7-pass scrub cycles are not unheard of.

The significant point to remember is that any software-based scrubbing technique is always "an attempt" to remove all traces of pre-existing data. Remanence issues mean that there will almost always be some old data left at track edges, where the head happens not to reach during the scrubbing cycle, so if a Bad Guy was to get physical access to the drive (eg after disposal) he would be able to get something back via judicious use of electron microscopy. If the disks are merely being re-purposed, and their new custodians aren't expected to have physical access to them (such as if the disks are in the same managed datacentre before and after), then scrubbing is of very little benefit.

It's surprising how intensive a process scrubbing actually is. I gather from the grapevine that if you take a fully-stocked 3510 array, it takes over 100 hours to write contiguous "1"s to it from a server running moderately-recent 2GBit Fibre Channel, and several times that to write data sourced from /dev/random. In fact, the real-world customer seeing this is even considering putting an SCA6000 in the server to speed up random number generation!

The thing is, there's potentially another way of looking at this problem. We have a 3510 array to deal with here, which has a decent RAID controller in it. For this - and for our bigger arrays too - why not put it to work? In fact, in extremis:

  • blow away the current configuration in the array, putting all the disks to be scrubbed in one LUN
  • pick a disk (any disk) within that LUN
  • create a volume which has that disk as the principal target, and has all other to-be-scrubbed disks mirrored to it
  • using your server, scrub this size-of-one-disk volume and let the RAID controller do the rest :-)
While Your Mileage May Vary depending on the capabilities of your RAID controller (in terms of how many mirrors you can have in a volume - 3500-series kit being happiest doing things pairwise - and the overall performance of the mirroring function within the controller, etc), it seems pretty obvious to me that the time required for a scrubbing process could be cut at least in half by adopting this idea; at worst, you can create a volume of two aggregations (either stripes or concats) mirrored to eachother and scrub that. A hardware RAID controller should be significantly better at doing mirroring than a server, at any rate, leaving the server to get on with the crunching involved with random number generation and shoving the data to the volume.

If our current RAID controllers aren't happy doing massive multi-mirroring, this might be a useful RFE :-).

It's also worth noting that, for RAID controllers which have a "cloning" capability, cloning a scrubbed disk after the fact isn't The Right Thing to do; it would be rather like getting the result of a SPECint calculation versus calculating SPECint 50,000 times and timing it. With scrubbing, the emphasis is on the process, not the result.

(2007-01-22 04:09:43.0) Permalink Comments [0]

Calendar

« January 2007 »
MonTueWedThuFriSatSun
1
3
5
6
7
8
9
10
12
13
14
16
18
19
20
21
24
25
26
27
28
30
31
    
       
Today

RSS Feeds

XML
All
/Cooking
/General
/Java
/Networking
/Security

Search

Links

Innovate on OpenSolaris

  Read via bloglines :
British Blog Directory.


Navigation



Referers

Today's Page Hits: 11