Wednesday Mar 04, 2009

Since Sun Storage 7000 have been launched in Asia South (Jan/09), it created a lot of interests as well as generate questions around this disruptive product despite the economic recession.Thanks to the innovation (hybrid storage, leverage opensource, analytic, commodity hardware) that totally change the storage design and bring the cost down significantly. The innovation in S7000 came at the right time to rescue us to continue the business with less cost while data still grow.

To make it simple for everyone as onestop shopping for S7000 information, this section of my blog will share you about FAQ i.e., success story, performance number, how to setup simulator and other knowledge i.e., whitepaper, video, trick & tip, blogs.

This blog is two ways communication and will be updated regularly. Feel free to comment or ask any questions if you have any, I will response to your query via this blog.

Success Story

1. Smugmug (internet photo sharing). They choosed Sun Storage 7410 for Web 2.0 operation. They came to present how their new storage perform at Open Storage Summit 2008 event. See slide. In short, "SSD is real (new way to improve performance rather than just add drive), Crazy fast (microsecond response time), Analytic is dream, Good Price tag, God-like power,etc"

2. iWeb (web hosting) recently deploy 7410 in their environment to run email application and hosting application. See more detail from their blog at  here

More to come when more customer allow us to share their success.


S7000 Review

1. Infoworld (Testing Center) have given review rating to Sun Storage 7210 with 9.2 points from 10 (excellent). Find out the detail from here. Below is their quote

"Sun's investment in ZFS continues to pay dividends, enabling the creation of compelling products such as the 7210 unified storage system. A thoroughly heavyweight filer in a small form factor, the 7210 packs as much as 44TB of surprisingly fast and affordable storage in a single 4U chassis. The Web-based GUI is fantastic, and performance is stellar. It's just about everything you could want in a storage server."

2. PC Pro from UK have rated Sun Storage 7110 with 6 stars!!. Find out why, please visit here. Below is their quote

"7110 is the storage host with the most since it offers NAS and IP SAN support - all as standard" and "The 7110 delivers a complete network storage solution with no hidden catches. Both NAS and IP SAN are supported, performance is very good, and Sun won't be beaten on value."

3. Computer World (Australia). See published here. Their quote is

"The Sun Storage 7210 Unified Storage System isn't cheap, but the cost is actually fairly low considering the capabilities. Make no mistake, none of this capability would be possible without ZFS; no other file system available today can make use of the resources present in the 7210 like ZFS, and the addition of the solid-state logging drive just increases the potency of this file system."

4. ILM - Informatique (France) have reviewd Sun Storage 7110. Below is their quote.

"The Sun Storage 7110 Unified Storage System makes it easier than ever to simplify your storage for less. The easy-to-use appliance is ideal for enterprise workgroups, remote offices or SMBs, and offers enterprise storage at entry-level prices". See full review at here.

5. Open SRS (blog) have tested our performance v.s. NetApps. See detail. Their quote is

"I’ll close with a reiteration of my opening:  I’m really impressed with this platform.  Sun’s done an amazing job, and the Open Storage platform is going to shake the (largely ridiculously overpriced) enterprise storage market to its core. Keep up the great work!"


Performance, Performance and Performance

Many people is asking about this. The performance is depend on various factor such as workload type (random, sequential, ratio of read/write, %cache hit,etc), test configuration (how many HDD, SSD, CPU, RAID), where is the data (DRAM, SSD, HDD), network environment (Gigabit, FastEthernet,etc). And it's challenge to test all of combination.

Table below contains some of testing result by Sun engineer (FishWork) team as well as some of field engineer. Please do read the detail from blogs to understand the assumption, environment, configuraiton we have tested since those will be different from your environment whereby you want to compare


 Workload Protocol/Platform
Model
Performance
Detail
 Random Read
 NFS/Unix  7410  281,000 IOPS from DRAM
Detaill
 Sequential Read
 NFS/Unix  7410  1.90GB/s from DRAM
Detail
 Sequential Read
 NFS/Unix  7410  1.04GB/s from HDD
Detail
 Sequential Write
NFS Unix
 7410 563MB/s to HDD
Detail
Sequential Read
CIFS/Windows
 7410 1.03GB from DRAM
Detail
 Sequential Read
 CIFS/Windows  7410 849MB/s from HDD
Detail
 Sequential Write
CIFS/Windows
 7410  620MB/s to HDD
Detail
 Random Read
CIFS/Windows
 7410 203,000 IOPS from DRAM
Detail
 Video Streaming
NFS/Unix
 7210 752MB/s from HDD
Detail
Sequential Read
N/A
7110
248MB/s from HDD (RAID1)
Detail
Sequential Write
N/A
7110
196MB/s to HDD (RAID1)
Detail






More testing result will be posted here when we have new data. 

Knowledge

Technical White paper of S7000, check here

Learn how to use Analytic in detail in S7000 from here

Get S7000 Simulator and run it on your notebook click here , to get instruction how to setup, click here

Quick Interactive Demo on how to manage S7000: Here

A lot lot of good video of S7000 (install, provisioning, dashboard, analytic, replication, share data): Here

Sunday Sep 14, 2008

This blog entry demonstrate how to analysis application I/O from Windows platform. More you know the application I/O behavior, the more you can optimize your design & architecture of storage to suppor this application. This is the real case in one of my customer engagement in the past.

A few years ago, one of my client have idea to migrate MS Exchange 2003 application that deploying on Enterprise storage to Mid-range storage in order to reduce cost. There is about 4,000 MS Exchange users on 4 servers clustered (separate mailbox based on employee level). This is one of mission critical application in the company. If something went wrong, the level of impact is huge. Imagine if the migrate to new infrastructure has worse SLA than previous, a lot of user can complain to IT department for the service level. The challenge of IT team is, how to maintain the SLA (mainly on response time) while lower down the cost, how to design & sizing the storage architecture on the new mid-range storage.

This customer also have remote replication from storage to storage for Disaster Recovery protection which make this case more difficult to deal with for design/sizing.


Generally,the nature of Enterprise Storage array class is to use cache to buffer everything (called cache centric). This cache will help in scenario such as application to storage service, storage to storage replication service. While mid-range is totally opposite since cache is much less, back-end is slower and less bandwidth. The way to sizing to match SLA is not straight forward.

I was involved in this exercise by using Sun storage performance tool by studying the I/O bebaviour of MS Exchange on Windows platform. This is to help customer to make decision to move forward effectively and with minimum risk. Unlike Solaris platform,  SWAT on Windows platform need to install small agent temporary to capture workload on the server (consider as risk), therefore, we decide to go by conservative way to capture only one server where customer confirm this is the heaviest workload among four Exchange servers. Whenever we do simulation, we just run 4 sessions by leveraging same pattern I/O of this server. That is good enough to represent worse case workload in test environment.

Before I start this exercise, I was told by client (IT team ) that these MS Exchange have a lot of load, expecting 10,000 IOPS combined from 4 MS Exchange servers and aggregate MB/s should be high. Therefore, they expect to have a lot of HDD spindle in the new mid-range storage to support this workload level. Well, let see how the result goes.

Once data capturing is completed (we did on peak time - 09:00, usually user will check their mail before doing anything), we are able to see detail of I/O in this MS Exchange environment (IOPS, MB/s, read/write ratio, queue pending, block size), we also can generate replay file that we can bring back for workload simulation at our Solution Center where we have mid-range storage to try it out.

This customer have the following  I/O characteristics of their MS Exchange as follow

- Response time is about 25ms (ok)

- Read/Write ratio is about 50% (lot of write)

- IOPS is about 250IOPS (at one server)

- Majority of application I/O blocksize is about 4K

- Average data rate is about 5MB/s

It turn out that the analysis result of data captured is quite interesting, it is really different from what they thought. Such as less IOPS, less bandwidth than anticipated earlier. From the analysis data, based on my expereince, we can simply think the storage design as follow guideline

  • New storage architecture should be formatted as RAID1 since write ratio is high

  • New storage architecture can have 2xFC channel from host to storage (not high MB/s and not high IOPS). That wil be suffice (save HBA and switch port cost)

  • New storage should be set blocksize at 4K to be inline with application I/O size for best performance

Now it's time to find out the best mid-range storage configuration by studying from workload simulation. In order to let customer have useful information to make decision, we do it as scenario based in the simulation.

The test case that we can think of are the following. However due to time & resource limited we can make it only 3 cases testing.

  1. Change cache

  2. Change HDD spindle

  3. Change replication mode factor

  4. Change blocksize (no time to test)

  5. Change RAID (no time to test)

Although we have equipment in lab (Sun Solution Center), but it's not exact to customer environment. Fortunately customer workload is not so high. So we can consolidate to run at one test server as follow description.

After the simulation exercise, we can get some of information to influence the storage design and architecture by comparing the response time v.s. factor changed. Below is the example for this customer case.

  • Bigger cache size tend to give more stability of response time, not too swing. This is because it reduce accessibility to disk spindle (no replication enable yet)

  • Cache size tend not to affect to application performance in the synchronous mode. Instead the network bandwidth & latency is more matter

  • Cache size can affect to application performance in asynchronous mode. This can be explained by nature of asynchronous mode where application is no need to wait for remote replication complete. So cache of primary box become matter for application performance

  • Asynchronous & Synchronous replication performance result shown huge different (many folds). It's expected as mid-range storage have less cache as well as the geniue of replication algorithm (jounal on disk v.s. on cache)

  • Asynchronous replication with separate disk-queue/bitmap/controller tend to be best approach to maintain response time when migrate from high- end storage. This must be taken into design consideration. Of course it's depend on the flexibility of storage whether we can setup it up
Obviously, with this customer case, by understanding the I/O from application, we will be able to design the proper/optimize storage architecture and sizing  easier and potentially lower the investment further. SWAT & VDBENCH is the good combination from analysis phase to simulation phase. Of course, if you want to do full loop of exercise (more accurate), you need time and resource, especially the simulation. Numberous of my cusotmers are done only with SWAT which is suffice to recommend them on some data points about architecture design and sizing.

Monday Aug 11, 2008

Some of us have asked me the alternative way to not to run SWAT in target system and leverage on data from OS command i..e, IOSTAT. This is because they might not be confident enough to install SWAT program to production system. It's common to think of that issue (better play safe until you are sure).

For Solaris platform, iostat command is part of Solaris, therefore, it's no harm to run it (in cron job) and you can control how often of data gathering which can manage overhead in the system. For example, capture data every 30 seconds or 60 seconds. Larger interval mean less raw data to be generated and less overhead.

Swat will accommodate you by allowing you to import iostat data into SWAT.  IOSTAT data that can be imported must be created using the minimum of the following options: iostat –xdn. For example,

 # iostat -xdn 60 1440 > /tmp/iostat.txt

 Above command mean use iostat to capture I/O data every 60 seconds and run it for 1 days. The result will be piped to iostat.txt under /tmp directory.

The  'c', 'z', 'p' 'X' options in iostat may be used, but the output of these options will be ignored. The 'M' option is honored and MB per second is interpreted instead of KB. Also, any device name starting with 'm' will be assumed to be an SVM volume and will be ignored.  Devices with less than a total of 100 total I/O's will also be ignored. Each reporting interval will be treated as a 10 second interval starting January 1st of the current year. To get accurate timestamps the complete Swat tool must be used. If iostat (using the –n’ parameter) is unable to translate a Kstat instance like ssdxx,yy to a proper /dev/rdsk/ device name something is wrong with Solaris and we recommend you file a bug against Solaris. Performance data for these devices will be reported under logical controller ‘undefined’, so make sure you always check to know how much device activity you are missing.

After you have completed data capturing using iostat, now you can import it to SWAT. See the step from GUI below.

(1) Launch SWAT and choose SPM option

(2) Select option to import iostat data

(3) Select the import file (i.e., iostat.xt at /tmp directory for this case)


(4) Tell SWAT, what's the period of your captured iostat data

(5) You got it!


EASY?

Please note that this method can only import and show data in graphical format (of course better than CLI by iostat itself!). You can not generate replay file to perform simulation by VDBENCH. Well, you can think that this is the starting point. If you satisfied the result and want to go further (more in depth). Then using SWAT to capture is the way to go. There is something we can do to make sure it's safe enough. For example, you might try from development server whereas you almost have similar OS, patch,etc. I usually go by importing iostat data approach first and if customer is ok and want to go further then I can run SWAT in test system and finally on production system.

For those of you who want to import iostat from other UNIX platform, it seem we can do it today from Linux and AIX. I'm sure AIX can be done since I was part of beta tester of this tool who try it out in my customer AIX environment and it work great!.

Yet to try on Linux, can you?.

Lastly, need to give credit to Henk  (thanks!) who listen to my feedback about non-Sun platform requirement and come out with beta code to test with my AIX environment.




Wednesday Aug 06, 2008

Back to 2004, I was involved to answer the interesting question from one of my client. They are using Sun storage array to serve single mission critical application (Billing). Since this storage spec can be scaled up in term of resource, customer want to move another application to share same storage (consolidation approach). Before going to do that, there are some classic questions about minimum investment v.s. maintain SLA. Here below is the conversation goes 

  • If we want to add more applications into same storage, is there any performance degradation of my existing application?
  • If there is any, can you advise how much the impact?
  • To void the impact (maintain SLA of existing application(, how much resource I need to throw in (HDD, cache, controller) or do need to setup anything?.

Well, this is really make sense question from CIO who want to lower down the risk as much as possible but don't want to jack up the investment if it's not necessary. The above questions can not be answered confidently without any credible information support. That's starting point I explore what does it take to address this concern. Yes, we should do analysis first to understand the application I/O characteristic and simulation to test system then see how the response time. If it's not satisfy then change setting or adjust resource until we got satisfied response time (SLA met). SWAT/VDBENCH tool came to help me to exercise this. 

Starting by do data capturing on Billing server (both nodes) as well as from Rating server (by SWAT). Next step is to analysis to understand how application I/O characteristic look like (IOPS, MB/s, read/write ratio, blocksize). The most important info we must know is the current response time (miliseconds). At this point, we will get baseline performance. Next is to extract the replay file that to be used later by VDBENCH for I/O simulation. It will be meta data to simulate exactly I/O pattern during testing phase. 

Using VDBENCH to simulate exact I/O workload to test storage configuration is the last part of the exercise. To make the whole story more sense for CIO (scenario based), I have created several cases of testing and simulation as example below.


After run simulation and of course there are four results out. And it is really informative for CIO since each case can tell customer for the impact of response time. CIO can do his own analysis to judge whether which way is the best for the investment v.s. SLA. See below for captured of response time on each case.


If you are the CIO then you can make deicion with more confident and more accurate since this is very close to your environment. It reduce risk when you go live after add in Rating application (or other application) to same storage. Although customer should have backup plan if go live is not really successful but there is the cost associated to it(effort to migrate in/out). So the best way is to predict what gonna happen when you move to new environment i.e., this case is to add new application in.

This is one of example I have done for SWAT/VDBENCH. It's actually on Solaris platform. Quite interesting really?. We turn the "guessing" become more tangible / predictable for storage performance related.

Next blog I will share more about how to use it in Windows platform with different scenario.

Tuesday Jul 29, 2008

This is finally good news to those of you who want to give a try on SWAT and VDBENCH. Now Sun has made them available for public. Thanks to SPBG (Storage Performance Benchmarking Group (SPBG), especially to Henk who listen to the feedback from internal Sun and public. Awesome!!.

Please note that  “Swat and Vdbench are tools delivered and supported by the Sun Microsystems, Inc. Strategic Application Engineering (SAE) – Storage Performance Benchmarking Group (SPBG). It is the responsibility of SPBG to maintain, support, and enhance these tools, not the official Sun Service department. Additionally, the tools are supported for internal Sun use and Sun partners only – not the end users.”

This is the official statement of support and its purpose is to make clear to the end user that Sun does not support these tools in its typical product fashion. However, if the tools are used in cooperation with Sun and/or one of its partners, for example in a sale situation, or when the tools are used to resolve a customer performance, then the tools will be supported by SPBG via the Sun field representative.

How to Start?

To make it easy for you to start, later my blog will start to share you how we can use these tools to solve storage performance related. I have done several exercises to use them the past. Stay tune for next blog.

URL

Swat 3.00:

https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI-Site/en_US/-/USD/ViewProductDetail-Start?ProductRef=SWAT-3.00-OTH-G-F@CDS-CDS_SMI

 
  
 
  

Vdbench 4.07:

https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI-Site/en_US/-/USD/ViewProductDetail-Start?ProductRef=VDB-4.07-OTH-G-F@CDS-CDS_SMI

Friday Jul 25, 2008

SWAT - Sun StorageTek Workload Analysis Tool, is a powerful data analysis package, designed to facilitate the analysis and subsequent resolution of storage hardware performance problems. SWAT is a host-based (support Windows and Solaris), storage-centric performance tool. It analysis I/O from host perspective (that's where application running and SLA is measured from). SWAT is written by Sun engineer - Henk Vandenbergh.

SWAT have superior user interface (GUI) that allow user can easily understand what's going on in their I/O in system in various aspects. For example, response time, IOPS, MB/s, read/write ratio, queue depth, I/O distribution, block size and son. These information can help user to determine what's the bottleneck and potentially how to solve it by leveraging these information properly. For example, if read/write ratio shown that majority is write and user is suffering on response time, potentially change RAID configuration from RAID5 to RAID1 would help. Another example is that, if workload distribution is not balance to volume/LUNs and those volume/LUNs are not sitting on the same layout (disk group), potentially re-layout (by make one big disk group and put these LUNs on top) can leverage resource better and lead to have better response time (Share nothing v.s. Share everything concept).

Although SWAT is now only available on Solaris and Windows platform (run as agent). The latest version - 3.0 allow us to import OS I/O performance data from other platforms (i.e.,AIX, Linux) to do high-level analysis (no agent running).

SWAT uses two methods to capture data regarding storage performance: 
• The Swat Performance Monitor (SPM) generates a long term, high-level view of performance in both the Solaris and Microsoft Windows operating environments. 
• The Swat Trace Facility (STF) generates a short term, detailed view of storage activity in both the Solaris and Microsoft Windows operating environments. Swat is a storage performance-monitoring tool; thus it does not collect information about server processes, context switching, file systems, or network connections. Nor does Swat profile an application, but it does capture the impact of an application’s demand for I/O on a storage system.

Which method above the user can use?. It's really depend on how much information they want to know. STF is low level analysis but a lot of raw data generated. While SPM is high level analysis which can not give user in depth data but much lesser raw data. Usually, for user who don't know what's the right period to do data capturing, it's recommended to run SWAT in SPM mode to see what's the period within a day they are interested and run again with STF mode to zoom down the detail (1 hour period).

SWAT is not only doing I/O analysis but also can work & integrate with other tools very well such as  VDBENCH tool. VDBENCH is the I/O workload simulation tool. SWAT can generate input (as a replay file) to VDBENCH tool for real workload simulation purpose. By running SWAT in STF mode (or unix script), it can re-generate the replay file that represent exact I/O activity in production system. This way will give user the great flexibility to test the storage configuration based on the real I/O behavior data without having real application running.

Wednesday Jul 23, 2008

We are hearing the story of Solid State Drive more and more. It's the flash memory that based on NAND gate (developed by Toshiba since 1989). Most of implementation today is in consumer electronic (thumb drive, MP3 player,digital camera card,etc). We can name it "Consumer Flash SSD". Since cost of  NAND gate drop very quick due to economic of scale, we start to see "Enterprise Flash SSD" in data storage system since last year. 


In the computer market, SSD is no doubt that potentially can reduce gap between speed of CPU/memory and HDD. Latency of HDD is in mili seconds v.s. DRAM is in nano seconds. Really Big gap there, that's why we have cache in storage as buffer to relief this gap. However, cache memory in storage is not cheap to acquire. So it's very exciting to see how SSD can play role in this area.




Although cost of SSD drop heavily in the last few years. But it still can not match up to HDD price($/GB) as well as its size (32GB to a few hundred GB). It's not really possible to replace HDD with SSD today due to mentioned reason. That lead to use concept of tier storage with SSD in the picture. It mean we will place some data on different tier based on several requirements i.e., latency, operation expense (OPEX), capital expense (CAPEX), retention time and so on


SSD is not only solve the storage performance with excellent response time but also in term of Eco-friendly. Without any mechanical movement, it reduce power consumption substantially as well as its weight compare to HDD, hence, less cost to build raise floor that can support heavyweight giant like disk array. 


Implementation of SSD


Due to SSD is faster than FC disk (usually classified as tier1), so the storage industry seem (by default) to define SSD in as tier 0. With solid-state storage at a higher cost vs. high-end disk drives, users are selectively deploying solid-state storage to increase app performance. As a general rule of thumb, solid-state storage is used in environments where disk drives are the limiting factor. Mission-critical apps like transaction processing and database systems linked directly to a firm's success (SLA) are prime examples of where solid-state storage is used today. Logs file area of  databases or whole database are other candidates for solid-state storage.

There are number of ways to implement SSD in the computing environment. for example, 

(1) Replace HDD with SSD in existing array. This may be seen as simplest way, however, since SSD performance is so fast which the current back-end of disk array may not be able to work with it effectively (back-pane too slow). User should add a few SSD and choose only volume they think it's a must to leverage SSD and leave other slots for HDD (tier storage within same array). 

(2) Put SSD as separate array, you may name it Flash Array or whatever (look like huge big iPOD). The whole array is all SSD. This is a very clean way of adding a Tier 0 , as it eliminates the need to tamper with disk-based arrays. 

(3) Replace SSD in server internal drive. This is another candidates. Since back-pane of server seem to be very fast and some server now have a lot of drive (48 HDD inside Sun X4540 server). Adding SSD in can improve performance while still comprising on back-pane performance.

There is a possibility that in the next two to three years, we can see Tier 0 solid-state storage become a standard array option and it will come at a bit premium price and users will deploy it selectively. In the longer term, as pricing for NAND flash decreases, SSD will begin to replace high-end disk drives due to power consumption, footprint, cost, performance, reliability.

Sunday Jul 13, 2008

I have been working with storage performance for many years. Starting since 1999 when I have chance to conduct benchmark for application (ArcInfo) on Solaris. This ISV will choose Sun platform if performance on Sun platform (Server/Storage) can up to the mark. While we have quite number of tools to measure/analysis server performance, however, I found we are still lacking tools for storage. So that's starting point to keep me interested on how we can measure / analysis storage performance then lead to how we can improve performance of storage and eventually overall performance of customer application. 

Today, to solve server performance tend to be easier than storage since CPU power is increasing a lot of more and memory is cheaper and cheaper while even storage got cheaper (cost/GB) as well but storage have physic limit for performance (mechanical constraint - rotation per minute, still in milliseconds response time). So the gap is widen more and more (CPU go super fast, HDD improve a little).

In order to improve storage performance, we need to look into many aspects. For example, to sizing proper storage resource, it's depend on at least 3 factors (cost, performance, availability). The obvious example is RAID (0,1,5,6). We will get only 2 from 3. For instance, we can get RAID0 to provide us best performance and cost (no overhead i..e., parity) but we can not get availability. While on the server, availability can be separated from equation for sizing, hence, less complication. In addition, we need to look into access pattern from application perspective as well as what kind of storage technology to help. Today we have many storage technologies to choose such as tier storage (ILM), disk technology, controller technology, new storage protocol,etc. All can make storage performance analysis much more difficult.

When I start to study on this area, I have known a few people who inspired me. Some of them are still in Sun, some don't. Such as Brian Wong, Dave Fisk, Henk Vandenbergh, Allen Yen and etc. I met them in conferences and of course in the email alias where we have unlimited discussion. Several years back, Dave Fisk is the pioneer on this area and explain people that storage performance can be explained by queuing theory instead of treating it as the art. He shown us how "iostat" in Solaris OS work and relate to theory. It's my first time to see that storage performance can be explained by mathematics rather than treat it as the art. After he left Sun, we have Storage Performance Group - Steven Johnson, Amanda Hudson, Henk Vandenbergh and others who continue the idea/work until today. I have seen the way we deal with storage performance getting better. From command line of data collection/analysis which is very hard to understand by others, now we have fantastic GUI to work with. That make us much easier to deal with storage performance. 


View My Stats

This blog copyright 2009 by Paisit