This blog entry demonstrate how to analysis application I/O from Windows platform. More you know the application I/O behavior, the more you can optimize your design & architecture of storage to suppor this application. This is the real case in one of my customer engagement in the past.

A few years ago, one of my client have idea to migrate MS Exchange 2003 application that deploying on Enterprise storage to Mid-range storage in order to reduce cost. There is about 4,000 MS Exchange users on 4 servers clustered (separate mailbox based on employee level). This is one of mission critical application in the company. If something went wrong, the level of impact is huge. Imagine if the migrate to new infrastructure has worse SLA than previous, a lot of user can complain to IT department for the service level. The challenge of IT team is, how to maintain the SLA (mainly on response time) while lower down the cost, how to design & sizing the storage architecture on the new mid-range storage.

This customer also have remote replication from storage to storage for Disaster Recovery protection which make this case more difficult to deal with for design/sizing.


Generally,the nature of Enterprise Storage array class is to use cache to buffer everything (called cache centric). This cache will help in scenario such as application to storage service, storage to storage replication service. While mid-range is totally opposite since cache is much less, back-end is slower and less bandwidth. The way to sizing to match SLA is not straight forward.

I was involved in this exercise by using Sun storage performance tool by studying the I/O bebaviour of MS Exchange on Windows platform. This is to help customer to make decision to move forward effectively and with minimum risk. Unlike Solaris platform,  SWAT on Windows platform need to install small agent temporary to capture workload on the server (consider as risk), therefore, we decide to go by conservative way to capture only one server where customer confirm this is the heaviest workload among four Exchange servers. Whenever we do simulation, we just run 4 sessions by leveraging same pattern I/O of this server. That is good enough to represent worse case workload in test environment.

Before I start this exercise, I was told by client (IT team ) that these MS Exchange have a lot of load, expecting 10,000 IOPS combined from 4 MS Exchange servers and aggregate MB/s should be high. Therefore, they expect to have a lot of HDD spindle in the new mid-range storage to support this workload level. Well, let see how the result goes.

Once data capturing is completed (we did on peak time - 09:00, usually user will check their mail before doing anything), we are able to see detail of I/O in this MS Exchange environment (IOPS, MB/s, read/write ratio, queue pending, block size), we also can generate replay file that we can bring back for workload simulation at our Solution Center where we have mid-range storage to try it out.

This customer have the following  I/O characteristics of their MS Exchange as follow

- Response time is about 25ms (ok)

- Read/Write ratio is about 50% (lot of write)

- IOPS is about 250IOPS (at one server)

- Majority of application I/O blocksize is about 4K

- Average data rate is about 5MB/s

It turn out that the analysis result of data captured is quite interesting, it is really different from what they thought. Such as less IOPS, less bandwidth than anticipated earlier. From the analysis data, based on my expereince, we can simply think the storage design as follow guideline

  • New storage architecture should be formatted as RAID1 since write ratio is high

  • New storage architecture can have 2xFC channel from host to storage (not high MB/s and not high IOPS). That wil be suffice (save HBA and switch port cost)

  • New storage should be set blocksize at 4K to be inline with application I/O size for best performance

Now it's time to find out the best mid-range storage configuration by studying from workload simulation. In order to let customer have useful information to make decision, we do it as scenario based in the simulation.

The test case that we can think of are the following. However due to time & resource limited we can make it only 3 cases testing.

  1. Change cache

  2. Change HDD spindle

  3. Change replication mode factor

  4. Change blocksize (no time to test)

  5. Change RAID (no time to test)

Although we have equipment in lab (Sun Solution Center), but it's not exact to customer environment. Fortunately customer workload is not so high. So we can consolidate to run at one test server as follow description.

After the simulation exercise, we can get some of information to influence the storage design and architecture by comparing the response time v.s. factor changed. Below is the example for this customer case.

  • Bigger cache size tend to give more stability of response time, not too swing. This is because it reduce accessibility to disk spindle (no replication enable yet)

  • Cache size tend not to affect to application performance in the synchronous mode. Instead the network bandwidth & latency is more matter

  • Cache size can affect to application performance in asynchronous mode. This can be explained by nature of asynchronous mode where application is no need to wait for remote replication complete. So cache of primary box become matter for application performance

  • Asynchronous & Synchronous replication performance result shown huge different (many folds). It's expected as mid-range storage have less cache as well as the geniue of replication algorithm (jounal on disk v.s. on cache)

  • Asynchronous replication with separate disk-queue/bitmap/controller tend to be best approach to maintain response time when migrate from high- end storage. This must be taken into design consideration. Of course it's depend on the flexibility of storage whether we can setup it up
Obviously, with this customer case, by understanding the I/O from application, we will be able to design the proper/optimize storage architecture and sizing  easier and potentially lower the investment further. SWAT & VDBENCH is the good combination from analysis phase to simulation phase. Of course, if you want to do full loop of exercise (more accurate), you need time and resource, especially the simulation. Numberous of my cusotmers are done only with SWAT which is suffice to recommend them on some data points about architecture design and sizing.
Comments:

Hi, Paisit

Thanks for your wonderful sharing.
I am now doing io simulation test,software is swat 3.0 and vdbench4.07. My source server is a physical server with driver of (C: D: E:). My target server is a virtual server with a driver of C:. I created replay parameter file at source server:
---
rg=rg1,device=(0) * d_0 ios: 2138 max lba: 132GB cum size: 132GB
sd=sd1,lun=d:\test.txt,size=5g,replay=rg1
wd=wd1,sd=sd1
rd=run1,wd=wd1,elapsed=9999,interval=10,replay=C:\temp\vdbench407\flatfile.bin\flatfile.bin
---

I modified the replay file(replayfile.txt) at target server:
--
rg=rg1,devices=(0)
sd=sd1,lun=d:\test.txt,size=5g,replay=rg1
wd=wd1,sd=sd1
rd=run1,wd=wd1,elapsed=9999,interval=10,replay=C:\temp\vdbench407\flatfile.bin.gz
--

When i run vdbench at target server, it give the error message: "No replay file specified" , actually i have copied the replayfile at "C:\temp\vdbench407"
-----
C:\temp\vdbench407>vdbench -f replayfile.txt
vdbench distribution: vdbench407
For documentation, see 'vdbench.pdf'.
For revision updates (Sun internal website only):
http://webhome.sfbay/nwsspe/speweb/vdbench/index.html

11:26:32.524 input argument scanned: '-freplayfile.txt'
11:26:32.540
11:26:32.619 Setting shared library to: C:\temp\vdbench407\windows\vdbench.dll
11:26:32.729 Inserted 'rd=New_file_format_for_sd=sd1' to initialize new file for
sd=sd1,lun=d:\test.txt,size=5368709120
11:26:33.076 group.total_group_bytes: rg1 5.000g
11:26:33.186
11:26:33.312 Tool will expire on: Sun Apr 26 05:27:38 GMT+08:00 2009
11:26:33.344
11:26:33.833
java.lang.RuntimeException: No replay file specified (...)
-----

I have no idea now, could you please tell me what i can do?

Thanks
xdg

Posted by catchtime on April 07, 2009 at 03:19 PM SGT #

Post a Comment:
  • HTML Syntax: NOT allowed

This blog copyright 2009 by Paisit