Default style (Cherry Eve). Switch styles (Capricorn). Atom Feed Calendar
http://blogs.sun.com/9900/date/20090415 Wednesday April 15, 2009

ST 9900 Blog: 1st Competitive Response- EMC V-Max Storage System

 (This was updated April 16 with commentary on Tiered Storage Manager and some URLs on EMC V-MAX and the RadioIO interconnect.- go to the bottom of this Blog)

On April 14, 2009, EMC officially announced  the EMC V-MAX Storage System; it is also referred to as the EMC Virtual Matrix Architecture.

This is a significant announcement, this response will occure in a mulit-phased manner:

I. "First Take" - This Blog

II. At the end of this Blog will be several URLs where you can see source product data

III. "Second Take" will be provided after we have had more time to review this product release from EMC.

      I'll talk more about software later in this piece.

I. "First Take" - This Blog - ST 9990V vs EMC V-MAX storage system.

I am going to tell you an old server war story ecause what happened back then may very well apply to when we are about to encounter in terms of ST 9990 V vs the EMC V-MAX Storage System

The reminds when when when IBM clustered a bunch of RS 6000s togeather around a moderately performant switch/network, called it an RS 6000 SP2, and  placed Oracle Parallel Database on it for the backend database for applicatioins such as SAP R3.Sun created a competitive program to compete with the  IBM SP2 with the earliest  ancestor of the Sun Enterprise 9000 server, The Sun Enterprise 10000. which went GA 1997. This was Sun first Supercomputer based on the designs acquired when Sun purchased the Business Systems Divsion of Cray Research (the first and foremost SuperComputer company of that era).

 You may notice that the Sun Enterprise 9000 is still around and you do not hear much about IBM SP2s. Sun won. I will spare you more details.

So fast forward to 2009, and EMC has  hooked togeather a bunch of Storage Systems with a moderately performant switch/network and gave it a bunch of global data cache. It also now talking about to virtualized storage pools, scaling to 100s of thousands of TBs and tens of millions of IOPS, support hundreds of thousands of VMware machines in a single federated scale out architecture, and Fully Automated Storage Tiering (FAST) across virtualized storage. It looks  like CISCO is supporting this.

Ok, let's dig under the covers a bit.

EMC V-MAX

* Up to 128 Intel Xeon processor cores
* Up to 1 TB (terabyte) of global memory
* Fibre Channel/FICON/Gigabit Ethernet/iSCSI connectivity
* Flash/Fibre Channel/SATA drive support
* Scale to 2,400 drives
* Maximum usable, protected capacity of 2 PBs (petabytes)

There moderately perfomance switch/network is as follows:

V-Max uses an interconnect new to most of us - RapidIO. Burke states: "the first generation of the Symmetrix V-Max uses two active-active, non-blocking, serial RapidIO v1.3-compliant private networks as the inter-node Virtual Matrix Interconnect, which supports up to 2.5GB/sec full-duplex data transfer per connection – each 'director' has 2, and thus each 'engine' has 4 connections in the first-gen V-Max.".

Radpid I/O resides on the following resource used to connect the multiple V-MAXs togeather

The basic V-Max building block is a pair of quad-core Xeon 2.3GHx (5400) processors, 16 host and 16 disk enclosure ports (8 each per Xeon quad-core), 128GB of global memory, an EMC ASIC to handle the global memory access, and the RapidIO interconnect endpoints.

From a Software standpoint, EMC is emphasizing Fully Automated Storage Tiering (FAST) across virtualized storage for grouping like data togeather and moving it to a destination devices, and they are also emphazing SSD (Solid State Disk) as a place to place rapidly accessed data to improve performance.

Ok, let's start comparing. Sorry for going to speeds and speeds; it really isn't the way to the sell the ST 9990V, but this is just to give you context.

 Interconnet

The EMC V-MAX first generation can have up to  4 interconnects at 2.5 GB/sec, for a total of 10 GB/sec aggregate system bandwidth.

 The ST 9990 V has 106 GB/sec aggregate system bandwidth, of which 68 GB/sec is dedicated to data and 38 GB/sec is dedicated to control bandwidth. Hopefully, I am interpreting the aggregate bandwidth of the Rapid I/O aggregate bandwidth. But there is more to this than raw speeds and speeds. The dedicationseperation of bandwidth/data paths for specific functions is a signature design philosophy that your will see recurring in subsequent discussions. Not only does the ST 9990 V have more raw horsepower, the dedication of data paths between data and control minimizes contention and maximizes performance.

 Data Cache (EMC calls it Global Memory)

EMC is claiming a maximum of 1 TB of Data Cache, whereas the ST 9990V is 512 GB, we achieved this in mid-2008, and EMC is now claiming 1TB a year later. So we are playing leapfrog, and both vendors are dependent of external data cache suppliers.

The key difference between the ST 9990V and the EMC V-MAX is that the ST 9990 V uses Shared Memory for storing meta data. More on meta data later. But the ST 9990V is clearly a more sophisticated design with seperate/specialized shared memory for meta data AND a dedicated 38 GB/sec of control bandwith to move control/meta data along.without contention.

Ok, a little about Meta data. Meta data is what is used to keep track of where things are. This is important for sophisticated software such as Copy on Write and Dyanmic Provisioning where you have virutal volume be carved out of physical pools.

So there are a couple of ways to looks at this. The pure specsman ship way, which is really not the way to sell the ST 9990V, it really the ST 9900 software stack is where to compete. But for sake of comparing context, we'll continue of the speeds and feeds perspective.

CPUs

EMC V-MAX claims 128 CPUs. I really don't want to go here, but I guess I really have to. The Front End Directors and BackEnd directors have two boards each with 4 CPU, for a total of 8. There are 16 of these boards for a total of CPUs. Microcode which manages the machines as well as specific programs are kept in flash memory on the FED and BEDs. While we seldom talk about the Operating System, it is a Supercomputer Clustered Operating Systems with a Shared Nothing Architecture with message passing. If you buy me a drink we can talk about this offline. Reason why we use such a sophisticated OS on the ST 9990V is because making all the code residing on seperate boards work togeather is pretty hard. So I am alway skeptical when I hear of large Intel CPU counts and performance. Unless it is a classical HPC workload where you divide up your batches of seismic data and just submit it to specrfic nodes, the abilty to make a lot of CPU, memory boards, and program work togeather is pretty hard. That is why I told the story of Sun Enterprise 10000 versus IBM SP2 in 1997 becuase the Sun Enterprise employed cache coherency engineers to make all the discrete memory boards look like a single image and created algorithms to keep track of meta data  on various physical boards with CPUs, memory and I/O. I am not advocating that you compete on speeds and feeds, but I am guess that a certain percentage of our field personnel will encounter customer who fixiate on speeds and feeds, and hopefully this will allow you to address their issues.

ST 9990V Cross Bar Switch versus Radio I/O.

 The ST 9990 V Crossbar switch is the real deal, with 68 GB/sec data bandwidth, point to point non-contention. Sure, Radio I/O looks like a switch and does claims to avoid contention, but is not the same thing. That is why I told the old story of Sun Enterprise 1000 vs. IBM SP2. Same concpet. Net/net: we have excellent SPC-1 numbers. EMC can make all sorts of claims talk all kinds of theory how they would extend the architecture. The same things were said by IBM SP2.

 Here are some SPC-1 numbers, and I just happened to have gone through an exercise to do 1 Million of IOPs.

1. Presentation which states our IOPs at 200,245 IOPs per second
   for a 256 GB cache and 8 Back end directors. This is a SPC-1
   numbers

2. Therefore it would take 5 of these systems to achieve roughly One
   Million IOPS, assuming the customer transaction profile is close
   to what a SPC-1 transaction looks like. If it is not close to a
   SPC-1 transaction then the results are expected to vary.

3. Precise configurations  behind the SPC -1 Benchmark are here: http://www.storageperformance.org/results/benchmark_results_all/#hds_spc1

4. Disclaimer: The data contained in this e-mail is to start/engage in dialog
   regarding customer requirements. Until we understand what the characteristics
   of the transactions are, any proposal to provide configuration guidance for
   One Million IOPS lack credibility.

From a Software standpoint, EMC is emphasizing Fully Automated Storage Tiering (FAST) across virtualized storage for grouping like data togeather and moving it to a destination devices, and they are also emphazing SSD (Solid State Disk) as a place to place rapidly accessed data to improve performance.

In my opinion, EMC is reacting to ST 9900 Tiered Storage Manager, which was released by Sun in Fall 2004. That's about 4+ years ago, and EMC is just now catching up. As this time, there are several ways to initiate the movement of data across tiers. Obviously the first one is manually. Second is via scripting. So while EMC is claiming full automation, it may be wise to be circumspect about this to see what and *when* EMC will actually delivery this.

Update on April 16

-------------------

 With respect to TSM, we do not have fully automated movement of data across tiers. A requirements document is under development and may be headed to Hitachi Ltd. Japan engineering in a few weeks. This is not comittment, but the discussion is occuring.

If I step back a bit, there may be less meat and more rhetoric around the EMC annoucement. If we think historically, about the Sun Enterprise 10000 vs. IBM RS 6000 SP2, one can think of EMC V-MAX as a defensive move using cheaper, inferior interconnect technology. I bet they used Radio IO because it takes too much time, skill, and money to build a real crossbar switch like you have in both the ST 9990V and the ancestor of the M9000, the Sun Enterprise 10000. It's almost like EMC is admiting they can't keep up. They may have us the automated movement of TSM type of data, but we are ahead in many other areas.

See the Storage Academy India Presentation here and scroll down to the compeititve presentations. http://wikihome.sfbay.sun.com/Systems/Wiki.jsp?page=Presentations

URL on EMC V-MAX

http://www.rapidio.org/home




I am going to finish this blog later becuase it is getting late. 

 Ken Ow-Wing, Product Line Manager, ST 9900 Program









Comments:

I like how you describe the problem of storage arrays as being akin to single memory image supercomputing. Back before clusters "won the supercomputing fight," I described the architectural difference between clusters and "true" supercomputers by describing the sort of job they would be good at. The cluster is like building a pyramid. Lots of people working behind the same goal, each doing their own little bit, eventually, a giant stone mountain gets built. A single image supercomputer is like trying to pick up and move the pyramid.

Posted by Charles Soto on April 15, 2009 at 06:45 AM PDT #

Sadly, you've started off with a misunderstanding of both the Virtual Matrix Architecture AND in the initial V-Max implementation.

Although the inaccuracies will make it inherently easier for EMC to compete with your misinformed audience, I'll give you a few hints on where you're misleading your readers:

1) EACH V-Max ENGINE has 10GB/s of available RapidIO bandwidth to communicate with other engines, and there can be up to 8 ENGINES in a V-Max SYSTEM. Thus your aggregate bandwidth is incorrectly stated.

2) A certain percentage of global memory references can be served locally, without having to traverse the RapidIO fabric - these references occur at internal memory bandwidth data rates. As a result, you really cannot use "speeds and feeds" to predict performance, as I said in my blog posts.

3) Symmetrix has been built around a multi-processor architecture for more than 18 years - long before Hitachi started making external storage arrays. It's sorta NUMA-like, but not exactly. You might want to grab an architecture guide off of EMC.com so that you know more about what you're competing with...

4) V-Max Virtual LUNs is closer to TSM than TSM is to FAST, although V-Max Virtual LUN relocation is at least 2.5x faster and supports 128 times as many concurrent relocations as TSM - and does so without disrupting or deferring replication sessions to boot (which TSM cannot do). Relocating LUNs on command isn't FAST - FAST automates the decisions about what to move and when...dynamically, and without operator intervention. TSM doesn't do that.

Like I said - the misinformation makes it easier for V-Max to compete against Hitachi. But you seem to be one who prefers the facts.

Oh, and it's RapidIO, not RadioIO...

Posted by Barry Burke on April 16, 2009 at 08:34 AM PDT #

How can technology product managers mis-spell or pretend to mis-spell a technology being used in the product which a lot of readers are interested to read reviews, comparisons and thus judge if the product is superior or not? I am talking about mentioning Radio IO for Rapid IO. Ken mentioned some where in the blog that he would share some info if somebody buys him a drink. Looks like he was drunk while writing the blog.

Posted by Sundararajan on April 16, 2009 at 08:31 PM PDT #

Ken, great post. Note that I have a matching post at http://blogs.hds.com/michael/2009/07/vmax-emc-usp-uspv-copy.html. Note that I've even created a new category called "Evil Machine Copies" to continue my thesis that EMC's R&D is largely in M&A, marketing and sales and their real technical innovation engine comes from copying Hitachi at least for storage. If you want to please reach out to me and we can collaborate on future posts in this area.

Posted by Michael Hay on July 09, 2009 at 10:24 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed