** Bio ** Resume **
« Previous day (Dec 22, 2004) | Main | Next day (Dec 23, 2004) »

20041223 Thursday December 23, 2004

Big Sun Clusters!! Computers

The Center for Computing and Communication (CCC) at the RWTH Aachen University has recently published details about two interesting clusters they operate using Sun technology.  RWTH Aachen is the largest university of technology in Germany and one of the most renowned technical universities in Europe, with around 28,000 students, more than half of which are in engineering (according to their website).

Check this out!

First, there is a huge Opteron-Linux-Cluster that consists of 64 of Sun's V40z servers, each with four Opteron CPUs. The 256 processors total 1.1TFlop/s (peak) and have a pool of RAM equal to 512GB. Each node runs a 64-bit version of Linux. Hybrid Programs use a combination of MPI and OpenMP, where each MPI process is multi-threaded. The hybrid parallelization approach uses a combination of coarse grained parallelism with MPI and underlying fine-grained parallelism with OpenMP in order to use as many processors efficiently as possible. For shared memory programming, OpenMP is becoming the de facto standard.

See: http://www.rz.rwth-aachen.de/computing/info/linux/primer/opteron_primer_V1.1.pdf

Another Cluster is based on 768 UltraSPARC-IV processors, with an accumulated peak performance of 3.5 TFlop/s and a total main memory capacity of 3 TeraByte. The Operating System's view of each of the two cores of the UltraSPARC IV processors is as if they are separate processors. Therefore from the user's perspective the Sun Fire E25Ks have 144 “processors”, the Sun Fire E6900s have 48 “processors” and the Sun Fire E2900s have 24 “processors” each. All compute nodes also have direct access to all work files via a fast storage area network (SAN) using the QFS file system. High IO bandwidth is achieved by striping multiple RAID systems.

See: http://www.rz.rwth-aachen.de/computing/info/sun/primer/primer_V4.0.pdf


December 23, 2004 06:34 PM EST Permalink

Big -vs- Small Servers? Computers

Big Iron -vs- Blades. Mainframe -vs- Micro. Hmmm. We're talking Aircraft Carriers -vs- Jet Skis, right?

Sun designs and sell servers that cost from ~$1000 to ~$10 million. Each! We continue to pour billions into R&D and constantly raise the bar on the quality and performance and reliability and feature set that we deliver in our servers. No wonder we lead in too many categories to mention. Okay, I'll mention some :-)



While the bar keeps rising on our "Enterprise Class", the Commodity/Volume Class is never too far behind. In fact, I think it may be inappropriate to continue to refer to our high-end as our Enterprise-class Servers, because that could imply that our "Volume" Servers are only for workgroups or non-mission-critical services. That is hardly the case. Both are important and play a role in even the most critical service platforms.

Let's look at the next generation Opterons... which are only months away. And how modern S/W Architectures are fueling the adoption of these types of servers...

Today's AMD CPUs, with on-board hypertransport pathways, can handle up to 8 CPUs per server! And in mid-2005, AMD will ship dual-core Opterons. That means that it is probable for a server, by mid-2005 or so, to have 16 Opteron cores (8 dual-core sockets) in just a few rack units of space!! If you compare SPECrate values, such a server would have the raw compute performance capability of a full-up $850K E6800. Wow!

AMD CPU Roadmap: http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_608,00.html
AMD 8-socket Support: http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~72268,00.html
SPECint:_Rate: http://www.spec.org/cpu2000/results/rint2000.html
E6800 Price: http://tinyurl.com/3xbq2

Clearly, there are many reasons why our customers are and will continue to buy our large SMP servers. They offer Mainframe-class on-line maintenance, redundancy, upgradability. They even exceed the ability of a Mainframe in terms of raw I/O capability, compute density, on-the-fly expansion, etc.

But, H/W RAS continue to improve in the Opteron line as well. One feature I hope to see soon is on-the-fly PFA-orchestrated CPU off-lining. If this is delivered, it'll be Solaris x86 rather than Linux. Predictive Fault Analysis detecting if one of those 16 cores or 32 DIMMs starts to experience soft errors in time to fence off that component before the server and all the services crash. The blacklisted component could be serviced at the next scheduled maintenance event. We can already do that on our Big Iron. But with that much power, and that many stacked services in a 16-way Opteron box, it would be nice not to take a node panic and extended node outage.

On the other hand, 80% of the service layers we deploy are already or are attempting to move to the horizontal model. And modern S/W architectures are increasingly designed to provide continuity of service level even in the presence of various fault scenarios. Look at Oracle RAC, replicated state App Servers with Web-Server plug-ins to seamlessly transfer user connections, Load Balanced web services, TP monitors, Object Brokers, Grid Engines and Task Dispatchers, and SOA designs in which an alternate for a failed dependency is rebound on-the-fly.

These kinds of things, and many others, are used to build resilient services that are much more immune to component or node failures. In that regard, node level RAS is less critical to achieving a service level objective. Recovery Oriented Computing admits that H/W fails [http://roc.cs.berkeley.edu/papers/ROC_TR02-1175.pdf]. We do need to reduce the failure rate at the node/component level... but as Solution Architects, we need to design services such that node/component failure can occur, if possible, without a service interruption or degradation of "significance".

In the brave new world (or, the retro MF mindset) we'll stack services in partitions across a grid of servers. Solaris 10 gives us breakthrough new Container technology that will provide this option. Those servers might be huge million dollar SMP behemoths, or $2K Opteron blades... doesn't matter from the architectural perspective. We could have dozens of services running on each server... however, most individual services will be distributed across partitions (Containers) on multiple servers, such that a partition panic or node failure has minimal impact. This is "service consolidation" which includes server consolidation as a side effect. Not into one massive server, but across a limited set of networked servers that balance performance, adaptability, service reliability, etc.

Server RAS matters. Competitive pressure will drive continuous improvement in quality and feature sets in increasingly powerful and inexpensive servers. At the same time, new patterns in S/W architecture will make "grids" of these servers work together to deliver increasingly reliable services. Interconnect breakthroughs will only accelerate this trend.

The good news for those of us who love the big iron is that there will always be a need for aircraft carriers even in an age of powerful jet skis.


December 23, 2004 04:57 PM EST Permalink


Valid HTML! Valid CSS!

This is a personal weblog, I do not speak for my employer.