fintanr's weblog

Archives

« April 2005 »
MonTueWedThuFriSatSun
    
2
3
4
6
8
9
10
11
12
13
14
15
16
17
21
22
23
24
25
28
29
30
 
       
Today

the links




Twitter Updates

    follow me on Twitter
















    20050401 Friday April 01, 2005

    Enabling Suns Performance Lifestyle
    The recent article about Linux 2.6 being slower than 2.4 and Linus Torvalds calling for ongoing performance testing gives our group a timely reason to explain in a lot more detail what we do. One of my colleagues Sean McGrath posted a little teaser yesterday, so I'll add to that today before we start into a more detailed, and technical, set of posts.

    So what exactly is it that we do? Our group provides the infrastructure to help the wider Sun community to enable "Sun's Performance Lifestyle".We run a very large set of benchmarks on every build of every active train of Solaris using components of Sun's middleware stack (Java Enterprise System) or ISV apps (Oracle, Tibco, Reuters etc) where applicable, as well as benchmarking applications bundled in Solaris (ie Samba, Apache, Xorg), new java builds, JES on Linux and more in a totally automated manner. We also provide the same facilities to developers for work prior to integration, so that developers can make informed decisions regarding performance the whole way through the development cycle, rather than as an after thought.

    Our current matrix that we are running looks something like this

    -OSArch's
    Solaris
    Released Internal Builds
    Solaris Express Sparc
    amd64
    x86
    Solaris 10 Update Trains (currently s10u1)
    Solaris 9 Update Trains
    Solaris Patch Trains
    Java Enterprise System
    Released Internal Builds
    Solaris Express Sparc
    amd64
    x86
    Solaris 10 Update Train
    Solaris 10 FCS
    Solaris 9 Update Trains
    Solaris
    Development Builds
    Solaris Express Sparc
    amd64
    x86
    Java Solaris Express Sparc
    amd64
    x86
    Solaris 10 Update Train
    Solaris 10 FCS
    Solaris 9 Update Trains
    Windows
    Linux (Redhat, Suse)
    Userland Products
    (JES etc)
    Development Builds
    Solaris Solaris Express Sparc
    amd64
    x86
    Requested Solaris Builds
    JES for Linux Linux (Suse, Redhat) amd64
    x86

    And coming soon - OpenSolaris (we are hyped about this).

    As for numbers, last month we ran over 50,000 benchmarks - some are tiny, taking only seconds to run, some take several days, it all depends on what your benchmarking. Of course, as mentioned, this is all automated.

    A Brief Mention of Our Lab

    Our lab consists of about 800 odd machines, ranging from 1 cpu sparc, amd64 and intel boxes, all the way up to 72cpu E25k's, and covering just about everything in between. I use lab as a virtual concept here, we have a large lab in Ireland, and then machines in Boston, Austin, Menlo Park etc. Alongside this we share time on machines with other groups, dispersed all over the world, lets just say a fully loaded 25k costs a lot, and while we could use it continously, it makes more sense to use it sensibly and share it. And of course some of the boxes currently under development are only available in very small numbers, so we have no option other than to share.

    As an aside my personal favourite rig at the moment is a fully loaded E6800, with several terabytes of disk space (6120 Fibre Channel Arrays) attached which we use predominantly for I/O benchmarks, the kind of I/O that enterprise customers are doing. And a second aside here, any version of Solaris that we install on this machine can also be installed on a single cpu Sun Blade 2500 - no recompiles, special patches or kernel hacks needed, it just works.

    Upstream Performance Work

    PerfPIT

    We provide a performance pre integration test environment (PerfPIT), which every major project going into Solaris has to go through. Most groups use this at multiple stages during their project. Now lets put this in perspective, every major project that you hear about in Solaris (and a huge amount of ones that people are starting to write about) has to come through PerfPIT. So things like DTrace, Zones, Least Privilege, FireEngine etc all of these projects did one or more PerfPIT runs before integrating into Solaris. And what does PerfPIT involve you ask, basically we run the exact same set of benchmarks as we run in our more downstream testing on two kernels - one with the changes, and nothing but the changes, and one without.

    Performance Self Test

    Further upstream we provide another version of PerfPIT, called Performance Self Test, which is a mechanism for the development community to test performance changes more informally. The key here is that this is simple to use, and we provide a standard environment. Developers go to a webpage, point it at their kernels, select the benchmarks they want to run and hit submit. Everything else happens automatically.

    The best example of using this that I have seen was last year, when one developer in Sun was evaluating several new algorithims for a specific project. Rather than having to go through the PerfPIT process for each version he just submitted multiple self test requests, and choose the best solution - without ever having to setup a benchmark, e-mail or phone anyone in the group, or do anything that distracted him from the task at hand. Amusingly it wasn't until he had done his putback into Solaris that we realised what he had been doing. Thats automation for you though.

    Why do this?

    Simply put you are not allowed put your code back into the Solaris code base if its going to slow it down. One of our main goals is to protect performance in Solaris, so when performance improves we move our baselines to the new high water mark, and all subsequent builds are not allowed to regress from this new baseline. There are very, very occasional exceptions to this, i.e. if the fix is required to prevent crashes and data corruption, and it cannot be implemented without causing a performance regression. In the whole three years of Solaris 10 development this occured once, on one metric - and there were thousands, upon thousands of putbacks.

    Its due to the immense amount of work that our development colleagues have done, and are doing on Solaris, and our own work in providing practical support for them that Solaris 10 screams. And its getting faster every day. Our aim is to enable the best, not catch problems after they have occured.

    The Benchmarks

    The benchmarks themselves range from things such as SpecWeb to Kenbus to homebaked benchmarks for measuring things such as boottime and finally, and most importantly, real customer workloads (which we are always looking for - if you have an enterprise workload let us know, we are always interested in getting these in house).

    Bug hunting and analysis

    Obviously all of this work throws up some bugs, and we tend to be very methodical and exacting in our analysis of the bugs, our aim being to narrow it down to the exact lines of code that have caused a problem rather than just saying "theres a problem here". Theres quite a bit to this, so I'll leave further discussion for a seperate post.

    What we don't do

    We don't provide benchmark numbers for release - we work on out of the box Solaris, with as little tuning as possible to create the most realistic customer environment. Our colleagues in Market Development Engineering and Strategic Applications Engineering are focussed on getting the numbers that you see in press releases, so you wont see us mentioning numbers here.

    What we are hyped about

    OpenSolaris - we are so hyped about this its beyond belief. The beta community is very active already, and as we get closer to the code being released completely into the wild we are getting ready to work with the OpenSolaris community - and we can't wait.

    About the group

    I guess I should say a little bit about our team, the group is pretty small, ten full time engineers ( Sean, Gleb and I are the current bloggers), two interns at any given time and one manager (or ex-engineer as we prefer to call him ;) ). We are based in Ireland, although somewhat more geographically spread out than just the Dublin office, and most of us have been with Sun five plus years.

    Finally, keep an eye on our blogs, over the next few weeks we will go into a lot more detail about what we do, and how we do it.

    Technorati Tag

    Technorati Tag

    (2005-04-01 01:43:10.0) Permalink Comments [0]