Mark Koch's Weblog Mark Koch's Weblog

Thursday Sep 07, 2006

Background

A couple years ago, I built a 1TB file server using standard computer components. I wanted a place to put my photos, music and movies and be able to access it via Samba or NFS from other computers in my house... oh, and I'm a cheapskate so spending too much money was not an option. I used an old chassis and motherboard that I had along with some other components that I already had. For the drives, I hunted around and found that CostCo had a pretty good deal on 250MB drives. I bought two to start with. I also bought a cheap IDE-133 card since the on-board IDE controller was only IDE-66.

Weekends with Linux, Feel the Burn

I put my file server together and downloaded the latest Fedora Core 4. Installation of the OS went smoothly enough for the root drive but things soon took a turn for the worst. After alot of research and very experimental drivers and software RAID utilities, I had my system working. Since I only had two drives, RAID-5 was out of the question. I decided to concatenate the drives. I wasn't worried about backups since all my data was backed up on CDs and my music and movies were safely on their original disks and boxed up in storage somewhere. Worst case, I would have to spend a few days feeding CDs back into the system if things failed.

Things worked happily for about a week. I was still in the process of adding my movies and songs to the system when it started acting weird. The system went into read only mode. Looking at the log files, I saw lots of errors accessing the disks. So I did what any sysadmin would have done. I rebooted it. Upon reboot, I saw that my raid was gone. From my notes, I retyped the commands to create the RAID again and the RAID was working OK again.... for a few days. I went through this cycle of use, reboot and re-create my raid to the point of having a script that would do that all for me.

Another few months passed and finally one weekend, the Linux file server which had been unnaturally resurecting itself started to have performance problems. I had added two more disks with the intent to add two more. Since Linux RAID wouldn't let me add new disks into an existing RAID, I came up with a strategy to move the files around to different raid sets to balance out which files went where. I started to notice that 1 or 2GB files were taking over night to copy from one drive to the other. Something was wrong here and no amount of Googling would turn up a good answer. I decided to just deal with it and let the computer do it's thing. At least my files were still there.

Solaris is not Linux

A few months later, I attended a presentation at Sun when Solaris Nevada was only a few builds old. The presenter used her AMD64 bit laptop to walk through many Solaris Nevada demos including DTrace and ZFS. She made both DTrace and ZFS look too easy, it seemed to good to be true.

I decided to bite the bullet and build a new file server with Solaris and ZFS. This time I had a few extra dollars to spend so I beat my inner cheapskate into submission and headed to NewEgg.com online. I started with a CoolerMaster Stacker 810 series chassis. It's one of the few chassis that has 12 5.25" drive bays out the front and a relocatable power button panel. Using their included 3.5" drive adapter ( and two more purchesed ones) I was able to put twelve hard drives into the space of nine 5.25" bays, and each 4 disk adapter has a silent fan on it. I was able to loan 12 500GB drives for this project. With nine 5.25" bays filled up, I filled the other three with the power button panel at the bottom, a DVD-ROM drive at the top and a 80GB root drive below that. I pulled the ASUS motherboard out of my Myth-box-slash-failed-experiment-since-it-also-ran-linux box and repurposed it for this project. That motherboard had 1GB RAM and a AMD64 3400+. I researched the Sun hardware compatability list and found a cheap quad SATA controller that should do the job. It turns out that the chip on that controller was used in a Sun product somewhere, so I had a reasonable expectation that it would work pretty well. The SATA controller came as RAID-configured. I needed JBOD mode for this thing to work correctly so I re-flashed the controllers from a download from the chipmakers site.



Solaris is NOT for insomniacs

I installed Solaris without much trouble. OK, I admit that I've installed Solaris many many times in the past 17 years, so things were pretty easy for me. However, I was new to ZFS and was still skeptical about how simple it might be to set up and whether my data would still be there a few days later. I decided that I wanted to have a mirrored "data" pool for my most precious stuff like Quicken backups, family photos and laptop backups. I also wanted a Raid-Z "media" pool for movies and music that I knew I had a hard copy of and wouldn't cry if all was lost.



Using examples I got from a stack of ZFS presentation slides, I created my pools using only a few commands.... and that was it. But wait! I said to myself. What about formatting, partitioning and newfs? Hmmm... not needed. I was kind of bummed at this point. I had expected to spend the evening and possibly the whole weekend seting this thing up and futzing with it. No dice. My ZFS set up literally took about 5 minutes and has been running flawlesly for four months now. I've even upgraded from Solaris build 34 to build 42 to build 47 without any problems. I've had to come up with a few new hobbies because Solaris has become too easy to use.



The Future

With 750GB and 1TB drives coming out soon, I look forward to being able to swap out the 500GB drives one by one. ZFS doesn't care how big the drives are! It will determine the best use of the drives you put into the pool. And unlike the Linux RAID system, I can even grow my pools by simply adding new drives.

In summary, although I have used SunOS/Solaris for about 17 years, ZFS and RAID are pretty new to me. The Linux path was akin to throwing me to the wolves. It was difficult and frustrating for this experienced user who only wanted to put his files somewhere on the network. In contrast, setting up ZFS was easy to understand and easy to use. And so far, it just works.

Jonathan and Scott talk alot about cost of ownership. I'm just a hobbyist when it comes to using Solaris and ZFS at home but my free time is still worth something to me. I could be doing something else like watching my movies and listenting to my music instead of Googling for "linux+raid+failed+wtf+how+to+fix+my+computer".



Comments:

Nice! I'm about to try something like this, but am holding off for zfs root (your 80Gb root drive would die on me in about 2 months - hardware hates my guts). How are you going to upgrade the disks? Last time I checked, ZFS mirror pairs would only use space on each half == the size of the smallest disk. Has that changed, or is raidz different?

Posted by Dick Davies on September 08, 2006 at 08:10 AM PDT #

I haven't tried a ZFS root yet. My guess is that it's pretty far along.

In my config, I have one pool that is two 500GB drives as a mirror. The total capacity there is 500BG. My second pool is a RAID-Z of ten 500GB disks. The total capacity is 4.53TB


Here's my 'zpool list' output:
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
data                    464G    165G    299G    35%  ONLINE     -
media                  4.53T   2.22T   2.31T    49%  ONLINE     -

-bash-3.00# zpool status
  pool: data
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c5d0    ONLINE       0     0     0
            c3d0    ONLINE       0     0     0

errors: No known data errors

  pool: media
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        media       ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c5d1    ONLINE       0     0     0
            c6d0    ONLINE       0     0     0
            c6d1    ONLINE       0     0     0
            c7d0    ONLINE       0     0     0
            c7d1    ONLINE       0     0     0
            c8d0    ONLINE       0     0     0
            c8d1    ONLINE       0     0     0
            c3d1    ONLINE       0     0     0
            c4d0    ONLINE       0     0     0
            c4d1    ONLINE       0     0     0

errors: No known data errors

Posted by Mark Koch on September 08, 2006 at 11:22 AM PDT #

Will the SYBA SD-SATA-4P PCI SATA Controller Card work with Solaris 10 6/06 ? I don't want to play with OpenSolaris but I do want to play with ZFS across a large number of drives.

Posted by Jon Strabala on September 08, 2006 at 03:14 PM PDT #

I think the SYBA card will work with Solaris 10 6/06. I'm not 100% sure but here's my circumstantial evidence.

I know I found something indicating that the Silicon Image SIL3114 based SATA card would work in Solaris Nevada but for the life of me I can't find the references.

Addonics makes a quad SATA card ( for more money) that is the same Silicon Image SIL3114 chip that Syba uses. They claim driver support for Solaris 10.

http://www.addonics.com/support/faqs/faq-sunsupport.asp


More links:

  • Syba's Quad SATA Card
  • Silicon Image SIL3114 Chip
  • Hope that helps a little.

    Posted by Mark Koch on September 08, 2006 at 08:40 PM PDT #

    I noticed that Dell is selling a 15 drive 3.5" SAS JBOD (claims 30% faster drives than the 2.5" form factor that Sun sells). It would be interesting if this could be used with some TBD Sun or non-SUN HBA for running ZFS.

    For example a PowerVault MD1000 External Storage Array, SAS and SATA support - with eight (8) 146GB, 3Gbps, SAS, 3.5 inch, 15K RPM Hard Drive [with a 25% small business discount] for under $6,391.50 USD.

    IMHO coupling this type of SAS JBOD box with an X2100 M2, or a X4100 seems like it might make a reasonable cost ZFS platform. I think something like the LSI logicLSISAS3442E-R ($367 List) would work as an HBA via SFF 8470 connectors.

    Posted by Jon Strabala on September 14, 2006 at 08:57 PM PDT #

    I'm jealous. What kind of transfer rates are you getting? Have you noticed any bottleneck with the max PCI throughput? Do multiple SYBA sata cards work well together? I want to try 4 or 5 cards!

    Posted by Br on September 20, 2006 at 09:21 PM PDT #

    I have three cards working fine together. I don't see why a couple more would hurt. For data rates, there's the rub right now. When I copy large files ( 4-5GB) over the 1G ethernet, I get an initial data write rate of about 200Mbps. Then it oscillates between something like 150Mbps and 70Mbps for an average of about 125Mbps. I think what is happening is it's as fast as the system RAM will take it until the RAM is used up ( I only have 1GB, real Sun hardware ranges from 4GB-16GB). Since ZFS is doing something with the data in memory before it commits it to disk, there is a bottleneck here. I think the average 125Mbps ( approx 15MB/s ) is the limitation of writing to the RAID. These are just my guesses from observations. I guess one of these days, I would like to dig down into D-trace and see if that reveals any clues. Also, I want to upgrade the memory to 2GB to see how that affects things. Overall though, it's still faster than the IDE drives were in the Linux system. For reading files, I watch a movie via a samba mount using VLC player. There are occasional pauses ( like one per movie, lasting only about a second). Most movies play perfectly. It's flawless otherwise. So overall, I don't have a benchmark for reading high data rates.

    Posted by Mark on September 20, 2006 at 10:09 PM PDT #

    Mark, what power supplies did you use to power all those disks? Can you tell me the brands, model #'s, watts and effiency? Thanks in advance. BTW why didn't you divide the 12 disks into two RAID Z of 6 disk each?

    Posted by Richthofen on September 21, 2006 at 11:07 PM PDT #

    For the power supply, I used a Thermaltake 420W supply (Model: TT-420-AD ). The system draws about 320W during drive spin up and settles to 240W during use. As you can see in one of my previous comment postings, the drives are divided into a 500GB mirror of two 500G drives. The remaining ten drives are a single Raid-Z pool.

    Posted by Mark Koch on September 22, 2006 at 09:24 AM PDT #

    Oops! I mis-read your last comment on "why" I didn't divide the raid-z into two six disk pools. In theory, I could swap out one of the mirror drives and store that in a safe place instead of backing up to disk.
    My reply is, I have some data that I didn't want to raid. I wanted an exact copy of it.
    The rest of the space is for media that I already have a backup of in the form of their original DVDs and CDs.

    Posted by Mark Koch on September 22, 2006 at 09:35 AM PDT #

    What sata cables did you use? How much did they cost? Also, I'm looking for some SATA activity lights that I can connect to the cable (so when I drive goes out I can tell which one). Have you found anything cheap like that?

    Posted by Richtohfen on September 29, 2006 at 09:31 PM PDT #

    For SATA cables, I used mostly what I had on hand. The last couple of motherboards I bought came with a couple and so did the SATA cards. I think I ended up buying four ror five more at the local computer store. They were a no-name brand for about three dollars each. I think they were 20 inch cables.

    Posted by Mark Koch on September 30, 2006 at 12:07 AM PDT #

    I bought the Syba card you are using, but it's not clear which BIOS I should use to flash the card. Did you use the IDE BIOS? ZFS is awesome by the way. I've introduced it to several people and they will probably be using it in future projects.

    Posted by Sam on October 05, 2006 at 08:23 PM PDT #

    I believe you want the non-SATARAID bios, i.e. SiI3114 IDE BIOS

    http://www.siliconimage.com/docs/BIO-003114-100-5304.zip

    Posted by Mark Koch on October 05, 2006 at 08:57 PM PDT #

    Post a Comment:
    Comments are closed for this entry.