As Good A Place As Any
Tim Thomas's Blog
Recipe for a ZFS RAID-Z Storage Pool on Sun Fire X4540
[Update Sept 26th: I have revised this from the initial posting on Sept 25th. The hot spares have been laid out in a tidier way and I have included an improved script which is a little more generalized.]
Almost year ago I posted a Recipe for Sun Fire X4500 RAID-Z Config with Hot Spares. Now we have the new SunFire X4540, it has a different disk controller numbering and more bootable disk slots, so I have revisited this.
Using my Sun Fire X4540 Disk Planner, I first worked out how I wanted it to look....

The
server
has six controllers, each with 8 disks. In the planner, the first
controller is c0, but the controller numbering
will not start at c0 in all cases: if you installed Solaris off an ISO
image they will run from c1->c6; if Solaris is installed with
Jumpstart then they will run c0->c5, in one case I have seen the
first controller as c4. Whatever the first controller is, the others
will follow in sequence.
I assumed that mirrored boot disks are desirable, so I allocated two disk for the OS.
ZFS is happy with stripes of dissimilar lengths in a pool, but I like all the stripes in a pool to be the same length, so I allocated hot spares across the controllers to enable me to build Eight 5 disk RAID-Z stripes. There is one hot spare per controller.
This
script creates the pool as described above. The required arguments are
the desired name of the pool and the name of the first controller. It
does a basic check to see that you are on a Sun Fire X4540.
#! /bin/sh
#
#set -x
#
#Make ZFS storage pools on a Sun Fire X4540 (Thor).
#This WILL NOT WORK on Sun Fire X4500 (Thumper) as
#the boot disk locations and controller numbering
#is different.
#
#Need two arguments:
#
# 1. name of pool
# 2. name of first controller e.g c0
#
prtdiag -v | grep -w X4540 > /dev/null 2>&1
if [ $? -ne 0 ] ; then
echo "This script can only be run on a Sun Fire X4540."
exit 1
fi
#
case $# in
2)#This is a valid argument count
ZPOOLNAME=$1
CFIRST=$2
;;
*) #An invalid argument count
echo "Usage: `basename ${0}` zfspoolname first_controller_number"
echo "Example: `basename ${0}` tank c0"
exit 1;;
esac
#The numbering of the disk controllers will vary,
#but will most likely start at c0 or c1.
case $CFIRST in
c0)
Cntrl0=c0
Cntrl1=c1
Cntrl2=c2
Cntrl3=c3
Cntrl4=c4
Cntrl5=c5
;;
c1)
Cntrl0=c1
Cntrl1=c2
Cntrl2=c3
Cntrl3=c4
Cntrl4=c5
Cntrl5=c6
;;
*)
echo "This script cannot work if the first controller is ${CFIRST}."
echo "If this is the correct controller than edit the script to add"
echo "settings for first controller = ${CFIRST}."
exit 1
;;
esac
# Create pool with 8 x RAIDZ.4+1 stripes
# 6 Hot spares are staggered across controllers
# We skip ${Cntrl0}t0d0 and {Cntrl1}t1d0 as they are assummed to be boot disks
zpool create -f ${ZPOOLNAME} \
raidz ${Cntrl1}t0d0 ${Cntrl2}t0d0 ${Cntrl3}t0d0 ${Cntrl4}t0d0 ${Cntrl5}t0d0 \
raidz ${Cntrl0}t1d0 ${Cntrl2}t1d0 ${Cntrl3}t1d0 ${Cntrl4}t1d0 ${Cntrl5}t1d0 \
raidz ${Cntrl0}t2d0 ${Cntrl1}t2d0 ${Cntrl3}t2d0 ${Cntrl4}t2d0 ${Cntrl5}t2d0 \
raidz ${Cntrl0}t3d0 ${Cntrl1}t3d0 ${Cntrl2}t3d0 ${Cntrl4}t3d0 ${Cntrl5}t3d0 \
raidz ${Cntrl0}t4d0 ${Cntrl1}t4d0 ${Cntrl2}t4d0 ${Cntrl3}t4d0 ${Cntrl5}t4d0 \
raidz ${Cntrl0}t5d0 ${Cntrl1}t5d0 ${Cntrl2}t5d0 ${Cntrl3}t5d0 ${Cntrl4}t5d0 \
raidz ${Cntrl1}t6d0 ${Cntrl2}t6d0 ${Cntrl3}t6d0 ${Cntrl4}t6d0 ${Cntrl5}t6d0 \
raidz ${Cntrl0}t7d0 ${Cntrl2}t7d0 ${Cntrl3}t7d0 ${Cntrl4}t7d0 ${Cntrl5}t7d0 \
spare ${Cntrl2}t2d0 ${Cntrl3}t3d0 ${Cntrl4}t4d0 ${Cntrl5}t5d0 ${Cntrl0}t6d0 ${Cntrl1}t7d0
#End of script
I have called the script makex4540raidz-6hs.sh. In the below example I create a storage pool called tank and my first controller is c1.
root@isv-x4500a # makex4540raidz-6hs.sh tank c1This is how it looks...
root@isv-x4540a # zpool status
root@isv-x4500a # zpool status tank
pool: tank
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c3t0d0 ONLINE 0 0 0
c4t0d0 ONLINE 0 0 0
c5t0d0 ONLINE 0 0 0
c6t0d0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c1t1d0 ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
c4t1d0 ONLINE 0 0 0
c5t1d0 ONLINE 0 0 0
c6t1d0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0
c4t2d0 ONLINE 0 0 0
c5t2d0 ONLINE 0 0 0
c6t2d0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c2t3d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c5t3d0 ONLINE 0 0 0
c6t3d0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c2t4d0 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c4t4d0 ONLINE 0 0 0
c6t4d0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
c2t5d0 ONLINE 0 0 0
c3t5d0 ONLINE 0 0 0
c4t5d0 ONLINE 0 0 0
c5t5d0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c2t6d0 ONLINE 0 0 0
c3t6d0 ONLINE 0 0 0
c4t6d0 ONLINE 0 0 0
c5t6d0 ONLINE 0 0 0
c6t6d0 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c1t7d0 ONLINE 0 0 0
c3t7d0 ONLINE 0 0 0
c4t7d0 ONLINE 0 0 0
c5t7d0 ONLINE 0 0 0
c6t7d0 ONLINE 0 0 0
spares
c3t2d0 AVAIL
c4t3d0 AVAIL
c5t4d0 AVAIL
c6t5d0 AVAIL
c1t6d0 AVAIL
c2t7d0 AVAIL
errors: No known data errors
I have used this layout on my systems for over a year now in the labs, pounding the heck out of it. The first two controllers are marginally less busy as they support both a boot disk and hotspare, but I have seen very even performance across all the data disks.
So
far, I have not
lost a disk so I am probably way over a cautious with my hot
spares...famous last words
..but if you want to reduce the number of
hot spares to four, then it is
easy to modify the script by taking spares and adding them to the
stripes. If you want to do this, since the first two controllers are
marginally less loaded than the other controllers, I recommend you
modify the script to extend the
stripes on rows t6 & t7 as below . You need to
make this decision up front before building the pool as you cannot
change the length of a RAID-Z stripe once
the pool is built.
The zpool create command in the script would now look like this...the modified lines are in bold text.
<snip>
.
.
# Create pool with 6 x RAIDZ.4+1 stripes & 2 x RAIDZ.5+1 stripes# 6 Hot spares are staggered across controllers
# We skip ${Cntrl0}t0d0 and {Cntrl1}t1d0 as they are assummed to be boot disks
zpool create -f ${ZPOOLNAME} \
raidz ${Cntrl1}t0d0 ${Cntrl2}t0d0 ${Cntrl3}t0d0 ${Cntrl4}t0d0 ${Cntrl5}t0d0 \
raidz ${Cntrl0}t1d0 ${Cntrl2}t1d0 ${Cntrl3}t1d0 ${Cntrl4}t1d0 ${Cntrl5}t1d0 \
raidz ${Cntrl0}t2d0 ${Cntrl1}t2d0 ${Cntrl3}t2d0 ${Cntrl4}t2d0 ${Cntrl5}t2d0 \
raidz ${Cntrl0}t3d0 ${Cntrl1}t3d0 ${Cntrl2}t3d0 ${Cntrl4}t3d0 ${Cntrl5}t3d0 \
raidz ${Cntrl0}t4d0 ${Cntrl1}t4d0 ${Cntrl2}t4d0 ${Cntrl3}t4d0 ${Cntrl5}t4d0 \
raidz ${Cntrl0}t5d0 ${Cntrl1}t5d0 ${Cntrl2}t5d0 ${Cntrl3}t5d0 ${Cntrl4}t5d0 \
raidz ${Cntrl0}t6d0 ${Cntrl1}t6d0 ${Cntrl2}t6d0 ${Cntrl3}t6d0 ${Cntrl4}t6d0 ${Cntrl5}t6d0 \
raidz ${Cntrl0}t7d0 ${Cntrl1}t7d0 ${Cntrl2}t7d0 ${Cntrl3}t7d0 ${Cntrl4}t7d0 ${Cntrl5}t7d0 \
spare ${Cntrl2}t2d0 ${Cntrl3}t3d0 ${Cntrl4}t4d0 ${Cntrl5}t5d0
Posted at 12:30PM Sep 25, 2008 in Storage | Comments[8]
Hi Tim,
I understand the X4540 has the ability to boot from a inbuilt flash card. Have you had a chance to play with this yet?
Posted by Malcolm Gibbs on September 25, 2008 at 09:57 PM BST #
Hi Malcolm. Yes, there is a bootable flashcard, but i have not experimented with it myself. Rgds, Tim
Posted by Tim Thomas on September 26, 2008 at 10:20 AM BST #
I'm a noob to the Sun systems (been over 15 years since I've played with one. I've noticed that my x4540 that just arrived has a different physical drive layout than your drive mapping chart. What I have is like this:
left rear right rear
3 7 11 ... 43 47
2 6 10 ... 42 46
1 5 9 ... 41 45
0 4 8 ... 40 44
left front right front
Do you know if this is a different drive to controller mapping that what you've played with?
Thanks (and nice "how to" & script)
Jim
Posted by Jim Bucks on October 14, 2008 at 03:27 PM BST #
Oh, and the top cover says the boot drives can be 0, 1, 8, or 9.
Jim
Posted by Jim Bucks on October 14, 2008 at 03:28 PM BST #
Jim, your slot numbers are correct. Physically the disks are laid out 12 across and 4 down as show in the service manual (Sun Document 819-4359-14) on page 44. My disk planner shows a logical (rather than physical) view of how solaris "enumerates" the devices. Rgds, Tim
Posted by Tim Thomas on October 15, 2008 at 06:50 PM BST #
Do you have any "rough / approximate" timings for the zfs create (?? format ??) to actually create usable space on the tank?
I've had my x4540 created for a couple of days now, but when I try to copy about 20 Tb of files onto it, I keep getting error messages to the effect of "not enough free space".
I have been able to copy about 6.5 Mb onto it, and I'm hoping this is just the formatting process is still running on all the drives.
PS - I used your basic setup script, have 2 boot disks, 2 hot spares, and using raidz for the remaining 42 drives.
df -h looks like this --
hdtank 32T 39K 32T 1% /hdtank
hdtank/programs 32T 6.5G 32T 1% /export/programs
and, zpool list looks like this -
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
hdtank 39.9T 7.90G 39.9T 0% ONLINE -
Thanks,
Jim
Posted by Jim Bucks on October 21, 2008 at 09:33 PM BST #
Hi Jim. This is peculiar behavior. There is no "format" taking place, once zpool create/zfs create return they are done. Try running "zpool status" to see if everything is healthy and "zfs list" to double check the capacity of your file system. Rgds, Tim
Posted by Tim Thomas on October 22, 2008 at 06:05 AM BST #
Hi Tim,
Thanks for this example - here's an alternate layout, for use with CF boot (so all 48 disks are available). It's more or less a variation of your layout, with a spare on each controller, and the remaining disks laid out as 7 sets of 5+1, with no stripes containing more than 1 disk on the same controller.
Note the CF card consumes c0, so the spinning disks start at c1, in this case.
Apologies for the formatting - not sure how to get tabs to survive.
raidz c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0 c6t0d0
raidz c2t1d0 c3t1d0 c4t1d0 c5t1d0 c6t1d0 c1t2d0
raidz c3t2d0 c4t2d0 c5t2d0 c6t2d0 c1t3d0 c2t3d0
raidz c4t3d0 c5t3d0 c6t3d0 c1t4d0 c2t4d0 c3t4d0
raidz c5t4d0 c6t4d0 c1t5d0 c2t5d0 c3t5d0 c4t5d0
raidz c6t5d0 c1t6d0 c2t6d0 c3t6d0 c4t6d0 c5t6d0
raidz c1t7d0 c2t7d0 c3t7d0 c4t7d0 c5t7d0 c6t7d0
spare c1t1d0 c2t2d0 c3t3d0 c4t4d0 c5t5d0 c6t6d0
An example script to create the corresponding zpool command:
#!/usr/bin/perl
print "zpool create logpool ";
$vdevcount = 0;
foreach $t (0 .. 7)
{
foreach $c (1 .. 6)
{
if ($vdevcount == 0)
{
print "raidz ";
printf("c%dt%dd0 ", $c, $t);
$vdevcount++;
}
elsif ($vdevcount == 6)
{
push(@spares, sprintf("c%dt%dd0", $c, $t));
$vdevcount = 0;
}
else
{
printf("c%dt%dd0 ", $c, $t);
$vdevcount++;
}
}
}
print "spare ";
foreach $spare (@spares)
{
print $spare, " ";
}
print "\n";
Posted by Ben on March 17, 2009 at 04:14 AM GMT #