|

Wednesday April 27, 2005
Binary compatability and application dependicies
I see Oracle are echoing what many customers have been saying about Linux patches in this article over on the register.
"Speaking at the Software 2005 conference in Santa Clara, California, on Tuesday, Phillips told an audience of 1,000 chief executive officers: "[Customers] are bogged down by the dependencies between applications. Add a patch in Linux and five other things break. People ask: 'Why can't you tell me what the dependencies are?'"
This is exactly why binary compatability and interface stability are important. If you can't add patches with the confidence that your production systems will stay up, and that your applications will stay running on them, you can only be nervous. And it is also why Sun invests a huge amount of time into testing patches.
(2005-04-26 20:47:36.0)
Permalink

Tuesday April 26, 2005
Suns Performance Lifestyle : Automating Ourselves Into A Job
I had originally planned to base my second posting [1] on Suns Performance Lifestyle around
the concept of testing software versus hardware, ie the dreaded i/o bound benchmark.
The article is part written, but unfortunately I haven't managed to schedule time on
an available rig to put in some practical examples, real performance work obviously
will get priority, so instead I decided to write a bit about our automation, and how we
actually run the benchmarks.
We view automation as being very much key to our job, it allows us to remove the
mundane tasks and focus on the higher, and much more interesting, value add work, that
of finding and root causing performance issues.
High Level Overview
From a high level view the process of doing a benchmark run can be
broken down into the following steps
- Install Rig
- Install and configure required software
- Run the benchmark
- Collect the benchmark results
Looks nice and straight forward doesn't it. I attended an amusing presentation a few
years ago given by a colleague in Ireland entitled "A Simple Matter of Programming",
which featured that "magic happens here" box that we have all encountered (you know
the one when the Architect has handed over this beautiful design document that specs out
everything bar the actual programming and implementation. The magic happens here box in the
high level system diagram).
Lets call this "A Simple Matter of Programming and Implementation"
Installation
The key to all of this is the automation of the
installs. For Solaris we maintain a local jumpstart server on our lab network (with
mirrors as needed on remote networks), while we also maintain images and install
scripts for the various Linux clones of jumpstart (ie kickstart etc [2]) to use when
needed. For windows its a gratuitous abuse of the dd command, and some nifty automation
that Nicky, Sean and a non
blogging member of the team have put together ;).
I won't go into a jumpstart 101 tutorial here (although please ask if you would like
to see something like this), but its suffice to say that we have all of the various
steps in setting up a rig scripted.
Once a system is installed it then copies over all the relevant benchmarks and software
that it needs, reboots and disconnects itself from our lab naming service. We use
the host file as our only naming service where applicable as we don't want any
external factors effecting our runs. All benchmarks which involve any for of network
traffic (the vast majority of the benchmarks) are run on private subnets.
Execution
Now to actually run the benchmarks we have a custom home grown harness that has evolved
over the years, this has upsides and downsides. The upside is that the idea behind
the core of the harness is very straight forward, the downside is that its implementation
is relatively complex, and somewhat hardwired into our environment. To actually run a benchmark
we go through the following steps.
- Validate the config (ie make sure everything that we are expecting to be inplace
such as network interfaces, relevant software, relevant disks and so on are in place)
- Install a custom kernel if applicable
- Reboot
- Do any initial configuration thats needed, things such as building volumes
- Apply the relevant system tunings. As mentioned before we aim to keep our tunings
as close to out of the box Solaris as possible, so for most benchmarks this is a
pretty small set of tunings, things such as ndd values, file system tunings where
applicable, shared memory settings on images prior to Solaris 10 etc.
- Apply any relevant software tunings, say increasing the threads for a webserver or
upping cache sizes for directory servers.
- Reboot the machine to ensure everything is clean (obviously things such as ndd
tunings will be reset on reboot)
- Start the actual benchmark run
- newfs(1M) any filesystems that are going to be used by the benchmarks
- Execute the benchmark
During execution gather standard performance data
i.e. vmstat(1M),
mpstat(1M) etc
Gather custom data if required
Hooks exist for calling tools such as lockstat(1M), custom
dtrace(1M) scripts,
or other custom scripts when requested
- Collect the results and put them in a standard reporting format
- Copy the results back to our main server
- Reboot
- Restore the system to its initial blank configuration
- Lather, rinse and repeat as many times as is feasibly possible for the benchmark
(the more results the better).
The lather rinse repeat stage is quite important, we restore the system to a completely
blank state in terms of tunings and then start all over again. There is one big reason for
this. All benchmark runs have to be completely repeatable
Why So Hung Up On Being Repeatable?
Its a question that we get every so often, why does everything have to be so repeatable? (our
process is fine grained enough that barring an application crashing we can repeat each run
on an OS instance with exactly the same pids for each process). Put simply to aid in
debugging any problems we encounter. We have a couple of criteria before we log a bug
- The obvious one is that of "has performance degraded?"
- What is the variance on the results?
- Is the variance less than 0.5%, and less than the degradation?
If results are noisy we do some statistical analysis on them to ensure that it is a
valid degradation. At that point we log a bug.
If we allowed the runs to vary a large amount say by using a naming service that
might go down during a run or doing multiple runs on the same box without rebooting we are running
a very high probability of introducing variance, which then leads to having too much noise in
our results, and then we can't confidently log a bug. Now as you can imagine everyone is busy,
so the last thing we want to do is log spurious bugs about performance problems, and either
waste our own time tracing them down or pass them on to one of our colleagues in development
and have them wasting their time tracing down a phantom problem.
Lets give a simple example, say I have a bunch of results from benchmark X, and we are
just interested in metric Y from this benchmark. We see a performance drop off of
is 0.7% in metric Y, but the variance in our results is 1.2% - the drop off is within the margin
of error
for the run, so we can't log a bug. If everything is completely repeatable we can first look at
eliminating the cause of the variance, and then gather results to confirm if we do indeed have
a problem. If we can't repeat our experiments exactly then we end up in a situation where its
not possible to eliminate the variance and hence you can't log a bug, and a potential drop off
in performance could reach you, the customer.
From the opposite angle, that of the developer, if the problem is repeatable, and consistent, it
makes his/her life an awful lot easier in trying to narrow it down (in most cases this is
actually us, so we are making our own lives easier first), or alternatively it makes it a lot
easier to put a fix through the exact same scenario.
Pushing the Software To The Limit
We aim at all times to push the system to the absolute limit without any IO bottlenecks, no paging
etc. We
can't stress this enough. In practice this gives mpstat output that has as close to 0 as possible
in the idle column, and definitely 0 in the wait column, but with the columns still lined up (Bryan
has a great comment on this, I'm paraphrasing here, but its along the lines "the tool was
designed to report data with columns matching the titles, if the columns aren't matching the
titles thats a pretty good indication that you have a problem"). So an mpstat from a sample
rig may look like the following during an actual benchmark run.
(I had to use a screenshot here, as its possible that some browsers may throw of the
formatting, and someone would say, "but those columns aren't lined up". The mpstat here is from
the tail end of a rampup on a benchmark).
Custom Kernels and Standing On the Shoulders of Giants
We mentioned the PerformancePIT and Performance Self Test processes before. For both of
these processes we install what in Sun parlance is known as a bfu (you will hear a lot more
about bfu's when OpenSolaris comes out).
Bill Sommerfeld has posted a bit more about bfu's,
or more accurately a tool called acr that was recently integrated directly into the Solaris Express gate that is used
for resolving conflicts. Put simply tools like this eliminate the need for us to have any
manual interaction with custom kernels, they just work, which again allows us to focus on the
higher value add areas. (Ask anyone in Sun engineering if they have ever had to resolve bfu
conflicts, grab a coffee before you ask though, or maybe a beer if your at a BOF).
And Wrapped Around All Of This
As you might guess we don't go around looking for idle machines and installing them with
benchmarks, behind the scenes on our server we have a scheduler running which puts new builds
onto machines, makes sure idle machines are running benchmarks, allow us to reserve machines
and so on.
You have also probably guessed that we don't look at every result that comes in, again we
have automated all of this process as well, and we only look at results which are of interest
to us, either big performance gains (is it a real gain, were we expecting it, if not what
caused it) or small performance drop off. If a drop off is greater than 0.3% we start analyzing
it, and if a gain is over 5% we will look for what has caused the jump. Invariably we have a heads up
on any performance wins that are going to happen due to the PerformancePIT process, it is very
rare that we have to analyze a big jump that hasn't gone through all of the proper development
processes.
Automating Ourselves Into A Job
So why this title? What I have written about here is something that we don't even think about,
it just happens. It may need to occasional nudge every so often (but thats the scheduler more
than anything else), but in general this just goes on in the background. If we had to do this
work manually we would all become very bored, very quickly, so we automate it. We use the same
approach with everything that we encounter, if it can be done by a machine, get a machine to
do it. There is always more work out there, new tech to play with it and in a place
like Sun there is always something interesting to work on.
[1] I was rather chuffed to see Sean and I mentioned on osnews,
got to admit it was a very, very pleasant surprise.
[2] Before I get a mail going kickstart was around before Jumpstart, it wasn't,
Jumpstart has been in existence since at least Solaris 2.4 (thats the earliest
version I have encountered),kickstart first appeared around Redhat 5.0 I believe,
which would be around 1997 (please correct me if I'm wrong on this)
(2005-04-25 21:46:28.0)
Permalink
Putting Trolls to Bed
I just responded to this thread over on osnews following the announcement of Solaris Express 4/05. As a rule I generally don't feed trolls, but I am genuinely so bored with seeing "opensolaris doesn't exist" posts that I felt I had to respond. The post is reproduced in its entirety below.
<Start OSNews Post>
Sigh,
This has been hashed over repeatedly, it takes some time to get an OS ready to be opensourced, last time I checked no one has ever tried to opensource something the size or complexity of Solaris.
There is code available already, DTrace was released and is downloadable from http://www.opensolaris.org, and it is one of the most advanced, if not the most advanced system for diagnosing performance problems on live systems ever developed [1]
People are working feverishly on OpenSolaris at the moment, trying to make sure everything is right - that means a lot of reviews to ensure we release unemcumbered code. We have no intentions of throwing a bunch of code over a wall with no due dilligence and forgetting about it, OpenSolaris is about further expanding an already large community, and letting everyone see what is in Solaris. You can choose to participate if you wish [2].
It has been repeatedly stated that we are aiming towards Q2CY05, please look at the roadmap http://www.opensolaris.org/roadmap/
If you wish to choose not to belive that we are going to opensource Solaris please feel free, its your choice - we look forward to proving you wrong (and belive me knowing that we are going to prove you, and all of the other naysayers, wrong is a nice feeling).
On the other hand we are very grateful for the patience that most people are showing while we make sure we do this right.
[1] Please review all of the rebuttals about what on Linux can replace DTrace at http://blogs.sun.com/roller/page/fintanr/20050306 before telling us about LTT, KProbes and OProfile.
[2] http://blogs.sun.com/roller/page/jonathan/20050417
<End OSNews Post>
Now back to some real work. On code that is going to be opensourced very soon ;).
(2005-04-25 16:03:38.0)
Permalink

Wednesday April 20, 2005
Workaround for "FATAL: system is not bootable, boot command is disabled" on an obp
Poor error messages are a major source of annoyance. I hit this one today, for the first time in a few years. Background info - a v210 was rather abruptly powered down and feeling somewhat ill. So I logged onto the sc and got to my console, and type boot as one does.
{1} ok boot
FATAL: system is not bootable, boot command is disabled
Which is about as helpful as someone telling me the box is currently a brick. Which I know already.
Anyway just in case you happen to hit this the fix/workaround is to set auto-boot? to false, reset the box, and then set it to true and finally boot as shown below.
{1} ok setenv auto-boot? false
auto-boot? = false
{1} ok reset-all
SC Alert: Host System has Reset
Sun Fire V210, No Keyboard
Copyright 1998-2003 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.11.4, 4096 MB memory installed, Serial #xxxxxxxx.
Ethernet address 0:3:ba:xx:xx:xx, Host ID: 83xxxxxx.
{1} ok setenv auto-boot? true
auto-boot? = true
{1} ok boot
.......... lots of output ........
volume management starting.
The system is ready.
xxxxxx console login:
(2005-04-19 20:31:07.0)
Permalink
And Nicky V. is onboard....
One of the other folks in my group, Nicky Veitch, has started his blog, joining Sean, Gleb and myself. Of interest straight away are his notes on filebench - a benchmark that will be released soon onto sourceforge ( Richard leads the design and development off filebench).
I kinda like the current parking spot for his bike as well ;).
(2005-04-19 19:55:57.0)
Permalink

Tuesday April 19, 2005
libnjb on Solaris
My mp3 player is a Creative Zen Xtra (I just don't drink enough Martini to own an iProduct [1]). Anyway I obviously would prefer to use Solaris rather than Windows for this little gadget, so I compiled up gnomad2 which requires libnjb.
One of the issues compiling was a relatively common problem that I've encountered recently, that of types being defined for Linux but not for Solaris. The error in this case looks like
[fintanr@tiresias libnjb-2.0] $ make
cd src && make prefix=/usr/local
/usr/bin/bash ..//libtool --mode=compile gcc -I/usr/local/include -DHAVE_GETOPT_H
-DHAVE_LIBGEN_H -DHAVE_USLEEP -Wall -Wmissing-prototypes -c base.c
gcc -I/usr/local/include -DHAVE_GETOPT_H -DHAVE_LIBGEN_H -DHAVE_USLEEP -Wall
-Wmissing-prototypes -c base.c -fPIC -DPIC -o .libs/base.o
In file included from base.c:9:
libnjb.h:163: error: syntax error before "u_int8_t"
libnjb.h:163: warning: no semicolon at end of struct or union
libnjb.h:164: warning: type defaults to `int' in declaration of `usb_interface'
libnjb.h:164: warning: data definition has no type or storage class
And when we take a look at libnjb.h we see that our types are assumed to be the those that are on Linux.
Anyway the fix for this is simple, just add the following to libnjb.h
#ifdef __sun
#define u_int8_t uint8_t
#define u_int16_t uint16_t
#define u_int32_t uint32_t
#define u_int64_t uint64_t
#endif
and away you go. A patch has been submitted to the libnjb folks on Sourceforge all ready (the patch is actually for libnjb.h.in). If your really interested you can view it here.
[1] Actually I like the fact that I can replace the battery myself in the Zen.
(2005-04-19 03:35:53.0)
Permalink

Monday April 18, 2005
perlgcc
One of the more common complaints that you hear from people regarding perl on Solaris is that you need the Sun CC compiler to compile modules. This was addressed quite some time ago, but as I generally have cc available I never really notice the issue. Or I didn't until about ten minutes ago.
I need a local copy of Expect.pm for some stuff I'm working on, so I went off to compile up IO::Tty which is required by the Expect.pm module. And as I typed make I get the following.
/usr/bin/perl /usr/perl5/5.8.4/lib/ExtUtils/xsubpp -typemap
/usr/perl5/5.8.4/lib/ExtUtils/typemap Tty.xs > Tty.xsc && mv Tty.xsc Tty.c
cc -c -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_TS_ERRNO -xO3 -xspace -xildoff
-DVERSION=\"1.02\" -DXS_VERSION=\"1.02\" -KPIC
"-I/usr/perl5/5.8.4/lib/i86pc-solaris-64int/CORE" -DHAVE_DEV_PTMX -DHAVE_GRANTPT
-DHAVE_PTSNAME -DHAVE_SIGACTION -DHAVE_STRLCPY -DHAVE_SYS_STROPTS_H -DHAVE_TERMIOS_H
-DHAVE_TERMIO_H -DHAVE_TTYNAME -DHAVE_UNLOCKPT Tty.c
cc: unrecognized option `-KPIC'
cc: language ildoff not recognized
cc: Tty.c: linker input file unused because linking not done
Running Mkbootstrap for IO::Tty ()
chmod 644 Tty.bs
rm -f blib/arch/auto/IO/Tty/Tty.so
LD_RUN_PATH="" cc -G Tty.o -o blib/arch/auto/IO/Tty/Tty.so
cc: Tty.o: No such file or directory
cc: no input files
*** Error code 1
make: Fatal error: Command failed for target `blib/arch/auto/IO/Tty/Tty.so'
Basically whats happened here is that perl is compiled with the Sun compilers, and I'm trying to compile this module with gcc from /usr/sfw/bin. And perl doesn't like this (you can go take a look at /usr/perl5/5.8.4/lib/i86pc-solaris-64int/Config.pm for all the gory detail).
Anyway the work around is to use a handy little script bundled in Solaris called perlgcc, so rather than your normal perl Makefile.PL; make; make test; make install you do
/usr/perl5/5.8.4/bin/perlgcc Makefile.PL; make
and away you go. As an aside -KPIC is an option for generating position idependent code, and is dealt with in a lot more detail in the linkers and libraries documentation.
(2005-04-17 22:51:03.0)
Permalink

Thursday April 07, 2005
Richard McDougal and Jim Mauro on bsc
I noticed that Richard McDougal has started a blog, joining his partner in crime Jim Mauro. These guys authored Solaris Internals, a book that is used by everyone in our group on an almost daily basis. They are currently updating it for Solaris 10, it will be an invaluable resource once it comes out. Point your aggregators at their blogs, highly recommended.
In the meantime they have some teasers in the form of two presentations posted on the Solaris Internals site, well worth a read.
(2005-04-07 03:45:37.0)
Permalink
Writing iso CD's in Solaris
A friend of mine who I got using Solaris 10a few months ago mailed and asked me how to burn cds on Solaris today. He is generally a mac user at home, and windows as a development platform in work (for J2EE apps), and only deploys apps on Unix rather than developing them there, so he is somewhat used to using gui tools for tasks such as writing cd's.
However he has moved his home x86 box to Solaris 10 as he is using the Sun Java Enterprise System Application Server as his primary deployment platform these days, with Netbeans as his development environment, and he wanted to familarise himself with Solaris 10 as an underlying OS. Anyway a thirty second tutorial on writing cd's in Solaris was needed, so for those not familar with how to do this I figured I would post a quick example here.
The two commands that you need to read up on are cdrw(1) and mkisofs(8) [1], with a handy confirmation step involving lofiadm(1M) if you are so inclined (lofiadm requires root privleges).
Okay, so first off create a directory with all the files you want to burn to cd, lets say /tmp/foo, and go
from there
[fintanr@dhcp-ack03-200-118 tmp] $ mkisofs /tmp/foo > /tmp/foo.iso
Total translation table size: 0
Total rockridge attributes bytes: 0
Total directory bytes: 0
Path table size(bytes): 10
Max brk space used 8000
221 extents written (0 MB)
Optional stage, confirm that its all okay by mounting the iso image (as root).
Sun Microsystems Inc. SunOS 5.10.1 snv_09 October 2007
# lofiadm -a /tmp/foo.iso
/dev/lofi/1
# mount -F hsfs -o ro /dev/lofi/1 /mnt
# cd /mnt
# ls
foo.zip
#
Okay, so this is all okay, lets burn the file to cd.
[fintanr@dhcp-ack03-200-118 tmp] $ cdrw -i /tmp/foo.iso
Looking for CD devices...
Initializing device...done.
Writing track 1...done.
Finalizing (Can take several minutes)...done.
And voila, all done. I belive there is a tool in gnome for doing this as well, but I haven't checked.
[1] Not sure how mkisofs ended up in section 8 of the manpages, but I'll try to find out.
(2005-04-06 20:41:55.0)
Permalink

Tuesday April 05, 2005
Open Solaris CAB
The Open Solaris Community Advisory Board has been announced, check out the cab page on opensolaris.org.
Personally (but I would say this, wouldn't I) I think this is a fantastic line up, Roy Felding is an excellent choice as CAB chair (lets be fair here, Apache is probably the most high profile piece of open source on the planet, and it definately effects more people directly than any other piece of open source software), while from a Sun perspective Casper Dik is incredibly prolific on the various Solaris newsgroups and mailing lists (he is incredibly prolific internally as well, I don't know how he manages to get so much done), while Simon Phipps is a massive open source adovacate (I had the opportunity to attend a presentation Simon gave in Dublin a few years ago, and he is excellent to listen too).
The community elected members are Rich Teer and Al Hooper - Rich's Solaris Systems Programming Book is fantastic, and seeing as Al is originally from Dublin he can only be a good bloke ;). Pints are on me the next time your back.
Relevant blogs
Alan Hargreaves has links to several media outlets and other blog postings over on his blog.
(2005-04-04 18:43:23.0)
Permalink

Friday April 01, 2005
Enabling Suns Performance Lifestyle
The recent article about Linux 2.6 being slower than 2.4 and Linus Torvalds calling for ongoing performance testing gives
our group a timely reason to explain in a lot more detail what we do. One of my colleagues Sean
McGrath posted a little teaser yesterday, so I'll
add to that today before we start into a more detailed, and technical, set of posts.
So what exactly is it that we do? Our group provides the infrastructure to help the wider Sun community to enable
"Sun's Performance Lifestyle".We run a very large set of benchmarks on every build of every active train
of Solaris using components of Sun's middleware stack (Java Enterprise System) or ISV apps (Oracle, Tibco, Reuters etc) where
applicable, as well as benchmarking applications bundled in Solaris (ie Samba, Apache, Xorg), new java builds, JES on Linux and more in a totally
automated manner. We also provide the
same facilities to developers for work prior to integration, so that developers can make informed decisions regarding
performance the whole way through the development cycle, rather than as an after thought.
Our current matrix that we are running looks something like this
| - | OS | Arch's |
Solaris Released Internal Builds |
Solaris Express |
Sparc amd64 x86 |
| Solaris 10 Update Trains (currently s10u1) |
| Solaris 9 Update Trains |
| Solaris Patch Trains |
Java Enterprise System Released Internal Builds |
Solaris Express |
Sparc amd64 x86 |
| Solaris 10 Update Train |
| Solaris 10 FCS |
| Solaris 9 Update Trains |
Solaris Development Builds |
Solaris Express |
Sparc amd64 x86 |
| Java |
Solaris Express |
Sparc amd64 x86 |
| Solaris 10 Update Train |
| Solaris 10 FCS |
| Solaris 9 Update Trains |
| Windows |
| Linux (Redhat, Suse) |
Userland Products (JES etc) Development Builds |
Solaris Solaris Express |
Sparc amd64 x86 |
| Requested Solaris Builds |
| JES for Linux |
Linux (Suse, Redhat) |
amd64 x86 |
And coming soon - OpenSolaris (we are hyped about this).
As for numbers, last month we ran over 50,000 benchmarks - some are tiny, taking only seconds to run, some take
several days, it all depends on what your benchmarking. Of course, as mentioned, this is all automated.
A Brief Mention of Our Lab
Our lab consists of about 800 odd machines, ranging from 1 cpu sparc, amd64 and intel boxes,
all the way up to 72cpu E25k's, and covering just about everything in between.
I use lab as a virtual concept here, we have a large lab in Ireland, and then machines in Boston,
Austin, Menlo Park etc. Alongside this we share time on machines with other groups, dispersed all over the
world, lets just say a fully
loaded 25k costs a lot, and while we could use it continously, it makes more sense to use it sensibly and share
it. And of course some of the boxes currently under development are only available in very small numbers, so we have
no option other than to share.
As an aside my personal favourite
rig at the moment is a fully loaded E6800, with several terabytes of disk space (6120 Fibre Channel Arrays) attached which
we use predominantly for I/O
benchmarks, the kind of I/O that enterprise customers are doing.
And a second aside here, any version of Solaris that we install on this machine can also be
installed on a single cpu Sun Blade 2500 - no recompiles, special patches or kernel hacks needed, it just works.
Upstream Performance Work
PerfPIT
We provide a performance pre integration test environment (PerfPIT), which every major project
going into Solaris has to go through. Most groups use this at multiple stages during their project. Now lets put
this in perspective, every major project that you hear about in Solaris (and a huge amount of ones that people
are starting to write about) has to come through PerfPIT. So things like DTrace,
Zones, Least Privilege,
FireEngine
etc all of these projects did one or more PerfPIT runs before integrating into Solaris. And what does PerfPIT
involve you ask, basically we run the exact same set of benchmarks as we run in our more downstream testing on
two kernels - one with the changes, and nothing but the changes, and one without.
Performance Self Test
Further upstream we provide another version of PerfPIT, called Performance Self Test, which is a mechanism for
the development community to test performance changes more informally. The key here is that this is simple to use,
and we provide a standard environment. Developers go to a webpage, point it at their kernels, select the benchmarks
they want to run and hit submit. Everything else happens automatically.
The best example of using this that I have seen was last year, when one developer in Sun was evaluating several
new algorithims for a specific project. Rather than having to go through the PerfPIT process for each version he
just submitted multiple self test requests, and choose the best solution - without ever having to setup a benchmark,
e-mail or phone anyone in the group, or do anything that distracted him from the task at hand. Amusingly it wasn't
until he had done his putback into Solaris that we realised what he had been doing. Thats automation for you though.
Why do this?
Simply put you are not allowed put your code back into the Solaris code base if its going to slow it down. One
of our main goals is to protect performance in Solaris, so when performance improves we move our baselines to
the new high water mark, and all subsequent builds are not allowed to regress from this new baseline. There
are very, very occasional exceptions to this, i.e. if the fix is required to prevent crashes and data corruption,
and it cannot be implemented without causing a performance regression. In the whole three years of Solaris 10
development this occured once, on one metric - and there were thousands, upon thousands of putbacks.
Its due to the immense amount of work that our development colleagues have done, and are doing on Solaris, and our
own work in providing practical support for them that Solaris 10 screams. And
its getting faster every day. Our aim is to enable the best, not catch problems after they have occured.
The Benchmarks
The benchmarks themselves range from things such as SpecWeb to Kenbus to homebaked benchmarks for
measuring things such as boottime and finally, and most importantly, real customer workloads (which we are always
looking for - if you have an enterprise workload let us know, we are always interested in getting these in house).
Bug hunting and analysis
Obviously all of this work throws up some bugs, and we tend to be very methodical and exacting in our analysis
of the bugs, our aim being to narrow it down to the exact lines of code that have caused a problem rather
than just saying "theres a problem here". Theres quite a bit to this, so I'll leave further discussion for
a seperate post.
What we don't do
We don't provide benchmark numbers for release - we work on out of the box Solaris, with as little tuning as
possible to create the most realistic customer environment. Our colleagues in Market Development Engineering and Strategic Applications Engineering are
focussed on getting the numbers that you see in press releases, so you wont see us mentioning numbers here.
What we are hyped about
OpenSolaris - we are so hyped about this its beyond belief. The beta community is very active already, and as
we get closer to the code being released completely into the wild we are getting ready to work with the
OpenSolaris community - and we can't wait.
About the group
I guess I should say a little bit about our team, the group is pretty small, ten full time engineers (
Sean, Gleb and I are the current
bloggers), two interns at any given time and one manager (or ex-engineer as we prefer to call him ;) ). We are based in Ireland,
although somewhat more geographically spread out than just the Dublin office, and most of us have been with Sun five plus years.
Finally, keep an eye on our blogs, over the next few weeks we will go into a lot more detail about what we
do, and how we do it.
(2005-04-01 01:43:10.0)
Permalink
|