|

Friday April 01, 2005
Enabling Suns Performance Lifestyle
The recent article about Linux 2.6 being slower than 2.4 and Linus Torvalds calling for ongoing performance testing gives
our group a timely reason to explain in a lot more detail what we do. One of my colleagues Sean
McGrath posted a little teaser yesterday, so I'll
add to that today before we start into a more detailed, and technical, set of posts.
So what exactly is it that we do? Our group provides the infrastructure to help the wider Sun community to enable
"Sun's Performance Lifestyle".We run a very large set of benchmarks on every build of every active train
of Solaris using components of Sun's middleware stack (Java Enterprise System) or ISV apps (Oracle, Tibco, Reuters etc) where
applicable, as well as benchmarking applications bundled in Solaris (ie Samba, Apache, Xorg), new java builds, JES on Linux and more in a totally
automated manner. We also provide the
same facilities to developers for work prior to integration, so that developers can make informed decisions regarding
performance the whole way through the development cycle, rather than as an after thought.
Our current matrix that we are running looks something like this
| - | OS | Arch's |
Solaris Released Internal Builds |
Solaris Express |
Sparc amd64 x86 |
| Solaris 10 Update Trains (currently s10u1) |
| Solaris 9 Update Trains |
| Solaris Patch Trains |
Java Enterprise System Released Internal Builds |
Solaris Express |
Sparc amd64 x86 |
| Solaris 10 Update Train |
| Solaris 10 FCS |
| Solaris 9 Update Trains |
Solaris Development Builds |
Solaris Express |
Sparc amd64 x86 |
| Java |
Solaris Express |
Sparc amd64 x86 |
| Solaris 10 Update Train |
| Solaris 10 FCS |
| Solaris 9 Update Trains |
| Windows |
| Linux (Redhat, Suse) |
Userland Products (JES etc) Development Builds |
Solaris Solaris Express |
Sparc amd64 x86 |
| Requested Solaris Builds |
| JES for Linux |
Linux (Suse, Redhat) |
amd64 x86 |
And coming soon - OpenSolaris (we are hyped about this).
As for numbers, last month we ran over 50,000 benchmarks - some are tiny, taking only seconds to run, some take
several days, it all depends on what your benchmarking. Of course, as mentioned, this is all automated.
A Brief Mention of Our Lab
Our lab consists of about 800 odd machines, ranging from 1 cpu sparc, amd64 and intel boxes,
all the way up to 72cpu E25k's, and covering just about everything in between.
I use lab as a virtual concept here, we have a large lab in Ireland, and then machines in Boston,
Austin, Menlo Park etc. Alongside this we share time on machines with other groups, dispersed all over the
world, lets just say a fully
loaded 25k costs a lot, and while we could use it continously, it makes more sense to use it sensibly and share
it. And of course some of the boxes currently under development are only available in very small numbers, so we have
no option other than to share.
As an aside my personal favourite
rig at the moment is a fully loaded E6800, with several terabytes of disk space (6120 Fibre Channel Arrays) attached which
we use predominantly for I/O
benchmarks, the kind of I/O that enterprise customers are doing.
And a second aside here, any version of Solaris that we install on this machine can also be
installed on a single cpu Sun Blade 2500 - no recompiles, special patches or kernel hacks needed, it just works.
Upstream Performance Work
PerfPIT
We provide a performance pre integration test environment (PerfPIT), which every major project
going into Solaris has to go through. Most groups use this at multiple stages during their project. Now lets put
this in perspective, every major project that you hear about in Solaris (and a huge amount of ones that people
are starting to write about) has to come through PerfPIT. So things like DTrace,
Zones, Least Privilege,
FireEngine
etc all of these projects did one or more PerfPIT runs before integrating into Solaris. And what does PerfPIT
involve you ask, basically we run the exact same set of benchmarks as we run in our more downstream testing on
two kernels - one with the changes, and nothing but the changes, and one without.
Performance Self Test
Further upstream we provide another version of PerfPIT, called Performance Self Test, which is a mechanism for
the development community to test performance changes more informally. The key here is that this is simple to use,
and we provide a standard environment. Developers go to a webpage, point it at their kernels, select the benchmarks
they want to run and hit submit. Everything else happens automatically.
The best example of using this that I have seen was last year, when one developer in Sun was evaluating several
new algorithims for a specific project. Rather than having to go through the PerfPIT process for each version he
just submitted multiple self test requests, and choose the best solution - without ever having to setup a benchmark,
e-mail or phone anyone in the group, or do anything that distracted him from the task at hand. Amusingly it wasn't
until he had done his putback into Solaris that we realised what he had been doing. Thats automation for you though.
Why do this?
Simply put you are not allowed put your code back into the Solaris code base if its going to slow it down. One
of our main goals is to protect performance in Solaris, so when performance improves we move our baselines to
the new high water mark, and all subsequent builds are not allowed to regress from this new baseline. There
are very, very occasional exceptions to this, i.e. if the fix is required to prevent crashes and data corruption,
and it cannot be implemented without causing a performance regression. In the whole three years of Solaris 10
development this occured once, on one metric - and there were thousands, upon thousands of putbacks.
Its due to the immense amount of work that our development colleagues have done, and are doing on Solaris, and our
own work in providing practical support for them that Solaris 10 screams. And
its getting faster every day. Our aim is to enable the best, not catch problems after they have occured.
The Benchmarks
The benchmarks themselves range from things such as SpecWeb to Kenbus to homebaked benchmarks for
measuring things such as boottime and finally, and most importantly, real customer workloads (which we are always
looking for - if you have an enterprise workload let us know, we are always interested in getting these in house).
Bug hunting and analysis
Obviously all of this work throws up some bugs, and we tend to be very methodical and exacting in our analysis
of the bugs, our aim being to narrow it down to the exact lines of code that have caused a problem rather
than just saying "theres a problem here". Theres quite a bit to this, so I'll leave further discussion for
a seperate post.
What we don't do
We don't provide benchmark numbers for release - we work on out of the box Solaris, with as little tuning as
possible to create the most realistic customer environment. Our colleagues in Market Development Engineering and Strategic Applications Engineering are
focussed on getting the numbers that you see in press releases, so you wont see us mentioning numbers here.
What we are hyped about
OpenSolaris - we are so hyped about this its beyond belief. The beta community is very active already, and as
we get closer to the code being released completely into the wild we are getting ready to work with the
OpenSolaris community - and we can't wait.
About the group
I guess I should say a little bit about our team, the group is pretty small, ten full time engineers (
Sean, Gleb and I are the current
bloggers), two interns at any given time and one manager (or ex-engineer as we prefer to call him ;) ). We are based in Ireland,
although somewhat more geographically spread out than just the Dublin office, and most of us have been with Sun five plus years.
Finally, keep an eye on our blogs, over the next few weeks we will go into a lot more detail about what we
do, and how we do it.
(2005-04-01 01:43:10.0)
Permalink
Trackback URL: http://blogs.sun.com/fintanr/entry/enabling_suns_performance_lifestyle
|