Tuesday Oct 16, 2007
Tuesday Oct 16, 2007
The question of replacing disks in ZFS pools comes up every so often. The most common thing that's asked is whether ZFS will see larger disks if they replace smaller disks. Let's go through an example:
First, we'll create some files to use as pool storage, and create a zpool out of the smaller two.
bash-3.00# mkfile 64m /var/tmp/a0 /var/tmp/b0 bash-3.00# mkfile 128m /var/tmp/a1 /var/tmp/b1 bash-3.00# zpool create tank /var/tmp/a0 /var/tmp/b0 bash-3.00# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT tank 119M 111K 119M 0% ONLINE - bash-3.00# zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 /var/tmp/a0 ONLINE 0 0 0 /var/tmp/b0 ONLINE 0 0 0 errors: No known data errors
Here we've striped a pair of 64MB files for our pool. Now we'll replace the two disks in our stripe with their 128MB counterparts:
bash-3.00# zpool replace tank /var/tmp/a0 /var/tmp/a1 bash-3.00# zpool replace tank /var/tmp/b0 /var/tmp/b1
We wait a few moments, and then check to see that we're done:
bash-3.00# zpool status pool: tank state: ONLINE scrub: resilver completed with 0 errors on Mon Oct 15 15:47:58 2007 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 /var/tmp/a1 ONLINE 0 0 0 /var/tmp/b1 ONLINE 0 0 0 errors: No known data errors
Everything seems to have gone well, and the resilvering is complete. Let's take a look at the pool now:
bash-3.00# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT tank 247M 231K 247M 0% ONLINE -
This shows that it works with stripes. Will it work with raidz? Let's create a few more files and test.
bash-3.00# mkfile 64m /var/tmp/c0 /var/tmp/d0 bash-3.00# mkfile 128m /var/tmp/c1 /var/tmp/d1 bash-3.00# zpool destroy tank bash-3.00# zpool create tank raidz /var/tmp/a0 /var/tmp/b0 /var/tmp/c0 /var/tmp/d0 bash-3.00# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT tank 238M 177K 238M 0% ONLINE - bash-3.00# zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 /var/tmp/a0 ONLINE 0 0 0 /var/tmp/b0 ONLINE 0 0 0 /var/tmp/c0 ONLINE 0 0 0 /var/tmp/d0 ONLINE 0 0 0 errors: No known data errors
And now do the replace:
bash-3.00# for f in a b c d; do zpool replace tank /var/tmp/${f}0 /var/tmp/${f}1; done
We wait a little bit for the resilver to complete, and then check the status and size:
bash-3.00# zpool status pool: tank state: ONLINE scrub: resilver completed with 0 errors on Tue Oct 16 08:01:00 2007 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 /var/tmp/a1 ONLINE 0 0 0 /var/tmp/b1 ONLINE 0 0 0 /var/tmp/c1 ONLINE 0 0 0 /var/tmp/d1 ONLINE 0 0 0 errors: No known data errors bash-3.00# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT tank 238M 408K 238M 0% ONLINE -
OK, so that didn't exactly work. The device list is correct, but the size is the same. Let's try export-import to see if that will allow ZFS to see the new size:
bash-3.00# zpool export tank bash-3.00# zpool import -d /var/tmp tank bash-3.00# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT tank 494M 189K 494M 0% ONLINE - bash-3.00#
And it works! Of course, if you've got shared filesystems or volumes, via nfs or iscsi, it makes exporting and reimporting a bit trickier - you'd need to wait until your users have gone home for the day, or just reboot the machine (which does an implicit export/import). It'd be nice if this could happen automatically, as in the striping case above. A bug has been written for this (6606879)
The final case is mirroring:
bash-3.00# zpool destroy tank bash-3.00# zpool create tank mirror /var/tmp/a0 /var/tmp/b0 bash-3.00# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT tank 59.5M 94K 59.4M 0% ONLINE - bash-3.00# zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 /var/tmp/a0 ONLINE 0 0 0 /var/tmp/b0 ONLINE 0 0 0 errors: No known data errors
OK, now we'll do the replace:
bash-3.00# zpool replace tank /var/tmp/a0 /var/tmp/a1 bash-3.00# zpool replace tank /var/tmp/b0 /var/tmp/b1 bash-3.00# zpool status pool: tank state: ONLINE scrub: resilver completed with 0 errors on Mon Oct 15 16:09:10 2007 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 /var/tmp/a1 ONLINE 0 0 0 /var/tmp/b1 ONLINE 0 0 0 errors: No known data errors bash-3.00# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT tank 59.5M 218K 59.3M 0% ONLINE -
The size is still 59.5M. As in the raidz case above, this will take an export/import in order to effect the size change:
bash-3.00# zpool export tank bash-3.00# zpool import -d /var/tmp tank bash-3.00# zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 /var/tmp/a1 ONLINE 0 0 0 /var/tmp/b1 ONLINE 0 0 0 errors: No known data errors bash-3.00# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT tank 124M 116K 123M 0% ONLINE - bash-3.00#
To summarise: for plain stripes, also known as RAID-0, ZFS can automatically grow the pool after a replace. For mirroring (a.k.a. RAID-1) and raidz/raidz2 (an improved RAID-5/6), you need to export and reimport (or reboot) to get the new size until 6606879 is fixed.
Great post! Can zfs convert a mirror with 2 disks to a raidz1 with 3 disks? I am going to add a disk to my home system and want to increase the space and keep the redundancy.
Posted by Kevin on October 18, 2007 at 03:56 PM BST #
Unfortunately, adding devices to mirrors can only make 'wider' mirrors.
The only way I can think of to convert a mirror to a raidz would be to build a raidz separately, and do a zfs send to it.
Posted by Mark J Musante on October 22, 2007 at 02:23 PM BST #