fintanr's weblog

Archives

« April 2005 »
MonTueWedThuFriSatSun
    
2
3
4
6
8
9
10
11
12
13
14
15
16
17
21
22
23
24
25
28
29
30
 
       
Today

the links




Twitter Updates

    follow me on Twitter
















    20050426 Tuesday April 26, 2005

    Suns Performance Lifestyle : Automating Ourselves Into A Job
    I had originally planned to base my second posting [1] on Suns Performance Lifestyle around the concept of testing software versus hardware, ie the dreaded i/o bound benchmark. The article is part written, but unfortunately I haven't managed to schedule time on an available rig to put in some practical examples, real performance work obviously will get priority, so instead I decided to write a bit about our automation, and how we actually run the benchmarks.

    We view automation as being very much key to our job, it allows us to remove the mundane tasks and focus on the higher, and much more interesting, value add work, that of finding and root causing performance issues.

    High Level Overview

    From a high level view the process of doing a benchmark run can be broken down into the following steps
    • Install Rig
    • Install and configure required software
    • Run the benchmark
    • Collect the benchmark results
    Looks nice and straight forward doesn't it. I attended an amusing presentation a few years ago given by a colleague in Ireland entitled "A Simple Matter of Programming", which featured that "magic happens here" box that we have all encountered (you know the one when the Architect has handed over this beautiful design document that specs out everything bar the actual programming and implementation. The magic happens here box in the high level system diagram). Lets call this "A Simple Matter of Programming and Implementation"

    Installation

    The key to all of this is the automation of the installs. For Solaris we maintain a local jumpstart server on our lab network (with mirrors as needed on remote networks), while we also maintain images and install scripts for the various Linux clones of jumpstart (ie kickstart etc [2]) to use when needed. For windows its a gratuitous abuse of the dd command, and some nifty automation that Nicky, Sean and a non blogging member of the team have put together ;).

    I won't go into a jumpstart 101 tutorial here (although please ask if you would like to see something like this), but its suffice to say that we have all of the various steps in setting up a rig scripted.

    Once a system is installed it then copies over all the relevant benchmarks and software that it needs, reboots and disconnects itself from our lab naming service. We use the host file as our only naming service where applicable as we don't want any external factors effecting our runs. All benchmarks which involve any for of network traffic (the vast majority of the benchmarks) are run on private subnets.

    Execution

    Now to actually run the benchmarks we have a custom home grown harness that has evolved over the years, this has upsides and downsides. The upside is that the idea behind the core of the harness is very straight forward, the downside is that its implementation is relatively complex, and somewhat hardwired into our environment. To actually run a benchmark we go through the following steps.
    • Validate the config (ie make sure everything that we are expecting to be inplace such as network interfaces, relevant software, relevant disks and so on are in place)
      • Install a custom kernel if applicable
      • Reboot
    • Do any initial configuration thats needed, things such as building volumes
    • Apply the relevant system tunings. As mentioned before we aim to keep our tunings as close to out of the box Solaris as possible, so for most benchmarks this is a pretty small set of tunings, things such as ndd values, file system tunings where applicable, shared memory settings on images prior to Solaris 10 etc.
    • Apply any relevant software tunings, say increasing the threads for a webserver or upping cache sizes for directory servers.
    • Reboot the machine to ensure everything is clean (obviously things such as ndd tunings will be reset on reboot)
    • Start the actual benchmark run
      • newfs(1M) any filesystems that are going to be used by the benchmarks
      • Execute the benchmark
        During execution gather standard performance data
        i.e. vmstat(1M), mpstat(1M) etc
        Gather custom data if required
        Hooks exist for calling tools such as lockstat(1M), custom dtrace(1M) scripts,
        or other custom scripts when requested
      • Collect the results and put them in a standard reporting format
      • Copy the results back to our main server
      • Reboot
    • Restore the system to its initial blank configuration
    • Lather, rinse and repeat as many times as is feasibly possible for the benchmark (the more results the better).
    The lather rinse repeat stage is quite important, we restore the system to a completely blank state in terms of tunings and then start all over again. There is one big reason for this. All benchmark runs have to be completely repeatable

    Why So Hung Up On Being Repeatable?

    Its a question that we get every so often, why does everything have to be so repeatable? (our process is fine grained enough that barring an application crashing we can repeat each run on an OS instance with exactly the same pids for each process). Put simply to aid in debugging any problems we encounter. We have a couple of criteria before we log a bug
    • The obvious one is that of "has performance degraded?"
    • What is the variance on the results?
    • Is the variance less than 0.5%, and less than the degradation?
    If results are noisy we do some statistical analysis on them to ensure that it is a valid degradation. At that point we log a bug.

    If we allowed the runs to vary a large amount say by using a naming service that might go down during a run or doing multiple runs on the same box without rebooting we are running a very high probability of introducing variance, which then leads to having too much noise in our results, and then we can't confidently log a bug. Now as you can imagine everyone is busy, so the last thing we want to do is log spurious bugs about performance problems, and either waste our own time tracing them down or pass them on to one of our colleagues in development and have them wasting their time tracing down a phantom problem.

    Lets give a simple example, say I have a bunch of results from benchmark X, and we are just interested in metric Y from this benchmark. We see a performance drop off of is 0.7% in metric Y, but the variance in our results is 1.2% - the drop off is within the margin of error for the run, so we can't log a bug. If everything is completely repeatable we can first look at eliminating the cause of the variance, and then gather results to confirm if we do indeed have a problem. If we can't repeat our experiments exactly then we end up in a situation where its not possible to eliminate the variance and hence you can't log a bug, and a potential drop off in performance could reach you, the customer.

    From the opposite angle, that of the developer, if the problem is repeatable, and consistent, it makes his/her life an awful lot easier in trying to narrow it down (in most cases this is actually us, so we are making our own lives easier first), or alternatively it makes it a lot easier to put a fix through the exact same scenario.

    Pushing the Software To The Limit

    We aim at all times to push the system to the absolute limit without any IO bottlenecks, no paging etc. We can't stress this enough. In practice this gives mpstat output that has as close to 0 as possible in the idle column, and definitely 0 in the wait column, but with the columns still lined up (Bryan has a great comment on this, I'm paraphrasing here, but its along the lines "the tool was designed to report data with columns matching the titles, if the columns aren't matching the titles thats a pretty good indication that you have a problem"). So an mpstat from a sample rig may look like the following during an actual benchmark run.

    (I had to use a screenshot here, as its possible that some browsers may throw of the formatting, and someone would say, "but those columns aren't lined up". The mpstat here is from the tail end of a rampup on a benchmark).

    Custom Kernels and Standing On the Shoulders of Giants

    We mentioned the PerformancePIT and Performance Self Test processes before. For both of these processes we install what in Sun parlance is known as a bfu (you will hear a lot more about bfu's when OpenSolaris comes out).

    Bill Sommerfeld has posted a bit more about bfu's, or more accurately a tool called acr that was recently integrated directly into the Solaris Express gate that is used for resolving conflicts. Put simply tools like this eliminate the need for us to have any manual interaction with custom kernels, they just work, which again allows us to focus on the higher value add areas. (Ask anyone in Sun engineering if they have ever had to resolve bfu conflicts, grab a coffee before you ask though, or maybe a beer if your at a BOF).

    And Wrapped Around All Of This

    As you might guess we don't go around looking for idle machines and installing them with benchmarks, behind the scenes on our server we have a scheduler running which puts new builds onto machines, makes sure idle machines are running benchmarks, allow us to reserve machines and so on.

    You have also probably guessed that we don't look at every result that comes in, again we have automated all of this process as well, and we only look at results which are of interest to us, either big performance gains (is it a real gain, were we expecting it, if not what caused it) or small performance drop off. If a drop off is greater than 0.3% we start analyzing it, and if a gain is over 5% we will look for what has caused the jump. Invariably we have a heads up on any performance wins that are going to happen due to the PerformancePIT process, it is very rare that we have to analyze a big jump that hasn't gone through all of the proper development processes.

    Automating Ourselves Into A Job

    So why this title? What I have written about here is something that we don't even think about, it just happens. It may need to occasional nudge every so often (but thats the scheduler more than anything else), but in general this just goes on in the background. If we had to do this work manually we would all become very bored, very quickly, so we automate it. We use the same approach with everything that we encounter, if it can be done by a machine, get a machine to do it. There is always more work out there, new tech to play with it and in a place like Sun there is always something interesting to work on.


      [1] I was rather chuffed to see Sean and I mentioned on osnews,
           got to admit it was a very, very pleasant surprise.
      [2] Before I get a mail going kickstart was around before Jumpstart, it wasn't,
           Jumpstart has been in existence since at least Solaris 2.4 (thats the earliest
           version I have encountered),kickstart first appeared around Redhat 5.0 I believe,
           which would be around 1997 (please correct me if I'm wrong on this)

    Technorati Tag

    Technorati Tag

    (2005-04-25 21:46:28.0) Permalink Comments [0]

    Putting Trolls to Bed
    I just responded to this thread over on osnews following the announcement of Solaris Express 4/05. As a rule I generally don't feed trolls, but I am genuinely so bored with seeing "opensolaris doesn't exist" posts that I felt I had to respond. The post is reproduced in its entirety below.

    <Start OSNews Post>

    Sigh,

    This has been hashed over repeatedly, it takes some time to get an OS ready to be opensourced, last time I checked no one has ever tried to opensource something the size or complexity of Solaris.

    There is code available already, DTrace was released and is downloadable from http://www.opensolaris.org, and it is one of the most advanced, if not the most advanced system for diagnosing performance problems on live systems ever developed [1]

    People are working feverishly on OpenSolaris at the moment, trying to make sure everything is right - that means a lot of reviews to ensure we release unemcumbered code. We have no intentions of throwing a bunch of code over a wall with no due dilligence and forgetting about it, OpenSolaris is about further expanding an already large community, and letting everyone see what is in Solaris. You can choose to participate if you wish [2].

    It has been repeatedly stated that we are aiming towards Q2CY05, please look at the roadmap http://www.opensolaris.org/roadmap/

    If you wish to choose not to belive that we are going to opensource Solaris please feel free, its your choice - we look forward to proving you wrong (and belive me knowing that we are going to prove you, and all of the other naysayers, wrong is a nice feeling).

    On the other hand we are very grateful for the patience that most people are showing while we make sure we do this right.

    [1] Please review all of the rebuttals about what on Linux can replace DTrace at http://blogs.sun.com/roller/page/fintanr/20050306 before telling us about LTT, KProbes and OProfile.
    [2] http://blogs.sun.com/roller/page/jonathan/20050417

    <End OSNews Post>

    Now back to some real work. On code that is going to be opensourced very soon ;).

    Technorati Tag

    Technorati Tag

    (2005-04-25 16:03:38.0) Permalink Comments [1]