Ramblings from Richard's Ranch

RAS and the X4100/X4200 Servers

Monday Sep 12, 2005

Here's a peek inside the reliability, availability, and serviceability (RAS) features of the new Sun Fire X4100 and Sun Fire X4200 servers. In this post, I'll talk about heat flow and power distribution, two important design constraints for computer system designs. You will see why the new X4100 and X4200 server is not just another run-of-the-mill x64 server.

Heat costs money and kills

By now, you've probably seen the evidence and talk about power consumption and heat. It has long been known that cranking up the number of components and speed of their operation generates more heat. You can do the math and figure out how the heat affects your pocketbook. From a RAS perspective, I'll add that heat kills. The ambient temperature has a direct affect on the reliability of electronics. Cool servers don't break as often as hot servers.

Mechanical systems are the most seriously affected and in modern computer systems that usually means disk drives. The X4100/X4200 designs use new 2.5" serial attached SCSI (SAS) disks, which draw about 40% less power than the equivalent 3.5" disk drives. The performance gurus will also note that the average seek times also drop by about 15% for the smaller drives. For power estimation purposes, plan on about 8W for a reasonably busy 2.5" SAS disk. The reduced form factor and power consumption means that we can offer 4 disks in the space and power budget formerly needed for two 3.5" disks. Ok, so most thin servers don't need more than 2 disks, and I fully expect that people will deploy many more X4100/X4200 servers with zero, one, or two disks than four disks.

But the reduced form factor also allows us to improve overall system RAS because we regain a bunch of space from the front bezel area. In older thin servers such as the wildly popular Netra t1 series, the disks basically consume all of the front bezel area. Consider what this means to the airflow in the system. You will have some heat generators sitting in front of the other electronics. Airflow is front-to-back, so the air that passes over your motherboard is already hotter than the ambient air. By using the 2.5" disk drives, we were able to move the disks out of the way and fully isolate the airflow for the disks from the motherboard. I'll use the X4100 to demonstrate. The configuration is such that the disks and DVD are located on the right side of the front of the server. Behind the drives are the hot swappable power supplies. There is a wall between the drives/power supplies and the motherboard to keep the air separate. The air flowing over the motherboard comes directly from the exterior and flows out the back. The orientation of the CPUs and memory is such that they all get clean (cool) airflow directly. Pushing air over the motherboard are two rows of hot-pluggable, redundant fans. The bezel in front of the fans is not blocked by bulky disk drives, further ensure good, cool, airflow into the server.

By contrast, the Sun Fire V60x and Sun Fire V65x chassis designs were done by "another company" and Sun took that design and re-badged it. The problem is that the other company wasn't used to designing data center class systems. The V60x has a series of holes in the side of the chassis. When you put a bunch of them into a rack, hot air circulates through the rack and back into the chassis. The result is that the systems run very hot. In the front, the disk drives further block the airflow, such that the motherboard sees inconsistent, pre-warmed air flow. These systems also use Xeon processors, which tend to run hot. The net result is that the environmental requirements are de-rated to adjust for the additional cooling requirements of the system. The elegant and clean design of the X4100 and X4200 is vastly superior to the older design in this respect.

Power conversion

Any discussion about power would not be complete without a discussion of power conversion. The Sun Fire X4100 and Sun Fire X4200 servers have RAS improvements there as well. The AC/DC power supplies are a new design which are remarkably simple. Only two DC voltage levels are provided: 3.3 and 12 VDC. The 3.3V level is used for control logic and the iLOM controller. Most of the power is internally distributed as 12V. This 12V is converted to the various logic levels at other places in the system using reliable DC-DC converters. This design decision allowed Sun to simplify the power supply, and any such simplifications improve reliability.

Power supplies operate in a hostile environment. We added a metal-oxide varistor (MOV, aka surge suppressor) into the power supply to help protect the system from unwanted power surges. This protection, the simplicity, low power consumption, and low parts count will result in a highly reliable power supply subsystem. This is systems engineering at its finest.

By contrast, the ATX-style power supplies used in systems, such as the new Sun Fire X2100 server, provide +3.3, +5, -5, +12, and -12 V. This is an old-school design and is significantly more complex than the new, simpler design. Simple designs tend to use fewer parts and thus have higher reliability. While browsing the power supplies at a local computer store last week, I noticed that most ATX power supplies were rated at around 80,000 hours MTBF. The power supplies in the X4100 and X4200 are projected to have more than twice as many hours MTBF.

Conclusion

The new Sun Fire X4100 and Sun Fire X4200 servers are definitely not just another repackaging of some old-school ATX design. The entire design was approached from a data center perspective with the intent of providing a highly reliable, high performance, small form factor server. The good RAS design should translate directly into long life, fewer service calls, and happy customers. Do you want to be happy?



[4] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg
Comments:

Thanks a lot, very useful blog! Dmitri Java2D Team

Posted by Dmitri Trembovetski on September 12, 2005 at 11:04 PM PDT #

We order serveral x4100's weeks ago and they still haven't been shipped, what's the deal?

Posted by sid wilroy on November 07, 2005 at 08:04 PM PST #

Thanks - that's a brilliant explanation!

I'm coming to the conclusion, though, that I must be unusual. I want small servers with more than 2 disks! I want to be able to store 100G or so of data reliably, without going to the expense of an external array. I want the machine to reboot cleanly even with one disk dead without having to worry about metadb quorum or UFS log-rolling panic issues. I would like to have the capability to do live upgrade onto a second pair of disks.

I liked the V60x and V65x, because they did let you have decent amounts of internal storage. OK, so they might not have had the best design, but they had the features I was after.

Posted by Peter Tribble on November 18, 2005 at 08:24 AM PST #

OK, so surely you like the X4100 and X4200 which support 4 disks and use onboard RAID-1. This should satisfy your needs.

Posted by Richard Elling on November 18, 2005 at 09:34 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed