Back to 2004, I was involved to answer the interesting question from one of my client. They are using Sun storage array to serve single mission critical application (Billing). Since this storage spec can be scaled up in term of resource, customer want to move another application to share same storage (consolidation approach). Before going to do that, there are some classic questions about minimum investment v.s. maintain SLA. Here below is the conversation goes 

  • If we want to add more applications into same storage, is there any performance degradation of my existing application?
  • If there is any, can you advise how much the impact?
  • To void the impact (maintain SLA of existing application(, how much resource I need to throw in (HDD, cache, controller) or do need to setup anything?.

Well, this is really make sense question from CIO who want to lower down the risk as much as possible but don't want to jack up the investment if it's not necessary. The above questions can not be answered confidently without any credible information support. That's starting point I explore what does it take to address this concern. Yes, we should do analysis first to understand the application I/O characteristic and simulation to test system then see how the response time. If it's not satisfy then change setting or adjust resource until we got satisfied response time (SLA met). SWAT/VDBENCH tool came to help me to exercise this. 

Starting by do data capturing on Billing server (both nodes) as well as from Rating server (by SWAT). Next step is to analysis to understand how application I/O characteristic look like (IOPS, MB/s, read/write ratio, blocksize). The most important info we must know is the current response time (miliseconds). At this point, we will get baseline performance. Next is to extract the replay file that to be used later by VDBENCH for I/O simulation. It will be meta data to simulate exactly I/O pattern during testing phase. 

Using VDBENCH to simulate exact I/O workload to test storage configuration is the last part of the exercise. To make the whole story more sense for CIO (scenario based), I have created several cases of testing and simulation as example below.


After run simulation and of course there are four results out. And it is really informative for CIO since each case can tell customer for the impact of response time. CIO can do his own analysis to judge whether which way is the best for the investment v.s. SLA. See below for captured of response time on each case.


If you are the CIO then you can make deicion with more confident and more accurate since this is very close to your environment. It reduce risk when you go live after add in Rating application (or other application) to same storage. Although customer should have backup plan if go live is not really successful but there is the cost associated to it(effort to migrate in/out). So the best way is to predict what gonna happen when you move to new environment i.e., this case is to add new application in.

This is one of example I have done for SWAT/VDBENCH. It's actually on Solaris platform. Quite interesting really?. We turn the "guessing" become more tangible / predictable for storage performance related.

Next blog I will share more about how to use it in Windows platform with different scenario.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed

This blog copyright 2009 by Paisit