Damien Farnham's Weblog

Tuesday May 05, 2009

Testing OpenSolaris made easy in a heterogeneous world using Virtual Box

Testing OpenSolaris in a heterogeneous world using Virtual Box

Solaris and OpenSolaris have very good reputations for being stable, well tested platforms while also being full of innovation like dtrace, power aware dispatcher, ZFS, Cross Bow etc.. In this environment test coverage is a moving target, new features, new uses, new platforms all make it necessary for teams involved in testing to adapt and innovate to cope with the ever increasing workload. Running to stand still.

The PerfQE team provides Performance QE coverage for most of Sun's software and hardware assets, producing 40,000+ performance metric a month, automated regression isolation to the putback ( or to put it another way when we log a bug in a Solaris biweekly build which could have hundreds of separate changes/putbacks we have automation that will automatically the engineer that caused the regression and reassign it to him/her )

We have 1400+ system across the globe all run at 100% 24*7 and no dedicated lab staff in Dublin where most of our systems are located you can get the idea that we don't have the luxury of putting up with mis behaving tests that require us to kick start. One pain point for us has been a 60 Desktop Windows PC configuration placing stress on a Solaris server via in kernel CIFS and Samba. Between test run we reboot the entire configuration but 1 in 8 to 10 reboots one of those Wintel PCs would hang requiring, requiring a manual reboot. In the past we've added IP power switches to reboot offending systems hard after timeouts. But frankly they cost and I have enough cables.

So we just finished replacing the 60 Windows 2000 with a v40z ( Quad Core Opteron ) running OpenSolaris and 60 Virtual Box Windows instances. We've gone through a detailed review to ensure we are producing the same ( actually it is a higher load ) on the Solaris CIFS server and we're seeing the same load pattern on the system under test but no hangs so far.

So what have we gained from this ? What are the advantages ?

  1. Space savings of over 95% ( they were desktop PC connected to a KVM )

  2. Power savings of 80%

  3. Capital saving on hardware 60 desktops vs one server are pretty large. ( I will not put a % on it as it varies too widely )

  4. Test hangs reduced by 100% ( making the team happier ), and getting more from our capital.

  5. We'll now be testing more versions of Windows as the overhead in managing the virtual images is so low.

  6. We can use dtrace to profile the load Windows sends to our server more easily.

  7. The v40z is easier to manage remotely and hardware problems are handled by FMA making life easier

There nothing here to stop anyone test/QA/QE group implement something similar and with saving as significant as we are seeing it really is worth the time.





Tuesday Sep 16, 2008

X4500 contoller numbers are renamed.

Keeping track of those annoying controller number changes.

One “feature” of Solaris which personally drives me round the twist is the way controller number can get changed I.e. /dev/dsk/c4t5d0s0 can get changed into /dev/dsk/c7t5d0s0 when an additional HBA ( Host bus adaptor ) card is added to your system. Yes, I know why this happens, and I know it has to it is still a pain in the *&^*(&.

And so I like to “brand” my disks with the name that the started out life with so I know what disks have changed. I've been doing it for some time and please no hassel about the quality of the scripting I'm a manager now :)

So why bother posting this blast from the past. We the reason is that the fixes for the following two bugs mean the default numbers of the controllers in your x4500 will change when upgrade to the latest ilom.

6727449 NPI: Require SWIE support for S10 U5 Thumper platform
6725713 ILOM 2.0.2.5/2.0.3.1 and later: virtual cdrom and floppy are enabled when used

#!/bin/ksh

# Script to bind link names to disks

# and reset them if the link names change.

# Damien Farnham DBE

# Tue Feb 13 13:02:14 GMT 1996

disklist()

{

format > /tmp/format.$$ 2>/dev/null - <<!

0

quit

!

grep cyl /tmp/format.$$ >/tmp/disklist

}

set_volname()

{

cat /tmp/disklist |

while read line; do

DISK_NAME=`echo $line | awk '{print $2}'`

format -d $DISK_NAME > /dev/null 2>/dev/null - <<!

volname

"$DISK_NAME"

y

quit

!

done

}

USAGE="Usage: set_links -r or -s "

case $1 in

-s)

disklist

set_volname

;;

*)

echo "Usage: set_links -s saves links name on disk label"

exit

;;

esac

Thursday Sep 11, 2008

Best Practise BIOS patching on Sun Intel and AMD x86 systems

Best Practise maintain BIOS for Sun Intel and AMD x86 systems

( a follow on from the SPARC firmware blog )

History

For many Solaris system administrators a BIOS didn't exist un recently because the vast majority where on SPARC systems and they had the OBP ( I'll not bore everyone with the long version of how simply awesome this was 18 years ago when I first saw the OK prompt and typed boot net )

But today Solaris is multi platform, Solaris x86 is not a poor relation to Solaris on SPARC, they are feature for feature equals, Xeon and AMD boxes grown from single socket, single thread, single core babies with pretty simple firmware / BIOS aka Basic Input/Output System. Today's x86 platforms are far from simple and have grown into pretty powerful beasts, take the x4600 with 8 Sockets Quad Core Opteron or the x4150 Quad Socket Quad socket Xeon. The BIOS to manage a 32 core x4600 is understandably a more complex beast than that your IBM PC of yesteryear and so having the right BIOS and right BIOS settings for your platform is critical to get the best from your x86 box.

What does this mean to a Solaris Admin on Sun Xeon & Opteron platforms ?

It simply means that as part of your Solaris Patch Policy you should always include updating your BIOS as well as the Solaris Patches. The are a number of pretty compelling reasons why.

  1. BIOS releases contain the latest microcode patches from Intel and AMD. Microcode is in effect a set of instruction loaded in a CPU to workaround hardware bugs.

  2. Sun updates the configuration of a BIOS to optimize it for the system to provide optimum performance. I recently tested two Quad core Xeon boxes from two vendors and while the had the same CPUs and memory there was a 40% difference in performance due to one having sub optimal settings with the SPECjbb2005 benchmark.

  3. QA teams across Solaris and the Systems group test up-coming releases of Solaris Updates, Nevada and OpenSolaris use the latest released BIOS for testing. Aligning with this, aligns your own software stack with the most tested and trusted Sun stack. BIOS problems can be very hard to diagnose and so limiting your exposure to them is a good idea ( read as lazy but smart )

  4. Its is easier to stay current . Upgrading from minor release to minor release is really safe and painless while going from a very old release may require you to do a number of intermediate upgrades, and of course this will happen when you least need additional work. And remember with all Sun servers you can upgrade from the SP.

  5. Your new Sun box may not come with the latest BIOS installed, an issue we are addressing ( please bear with us ) so even new systems can benefit from checking to ensure you are current.

How do I find out what my firmware release is on my X4150 ?

On your systems Service Processor

ssh oaf413-sc

root@oaf413-sc's password:

Sun Microsystems Embedded Lights Out Manager

Copyright 2006 Sun Microsystems, Inc. All rights reserved.

Firmware Version: 4.0.10

SMASH Version: v1.1

Hostname: SUNSP001B2493C5CC

IP address: 192.1.2.81

MAC address: 00:1B:24:93:C5:CC

-> show SP



/SP

Targets:

users

network

clock

AgentInfo

TftpUpdate

CPLDUpdate



Properties:

Firmwareversion = 4.0.10

Timeout = 300

CPLDVersion = 063



Target Commands:

show

cd

set

reset



Or for those old guard you can look at the system boot ;)

Details on how to log into your SC are included on docs.sun.com and the documentation supplied with each system.

How do I find out where the latest version of Sun System BIOS are ?

I find the fastest way is to use Sun System Handbook ( All seems familiar and common sense ? Good )

Select Servers in the first drop down box.

The select “x4150” on the 2nd.

And this pretty page jumps ups

http://sunsolve.sun.com/handbook_pub/validateUser.do?target=Systems/SunFireX4150/SunFireX4150

http://www.sun.com/servers/x64/x4150/downloads.jsp

And follow the instructions to download.

Wednesday Feb 14, 2007

Venting :)

It is amazing how little things can make you really frustrated where you'll work thru the big issues without too much trouble. Right now I'm being driven round the twist by callers looking for someone in our internal support organization. This person has the same 5 digit number but to reach them from some part of the world you need to add 70. One morning I received 10 calls. One gent from Germany rang launched into his problem, I tell politely that I think he has the wrong number, he tells no he's sure he has not and continues ! I then tell him he has and he needs to put 70. No sorry for wasting your time. So like many of the poor souls that suffer the same problem in Dublin I added a new message to my voice mail explaining that I'm not in support and if you are calling about a ticket you need to redial with 70 in front. So at least when I get in each morning I hoped I will not have to go thru 5 or 6 often long, often rude messages. Fixed ! Of course not I still get 1 or 2 each morning from &^*&^* ( this mornings best was from a Lady in Sweden ) angry I hadn't contacted her about her call ( after sitting thru my message telling her its not me ) It has also shown me that there is no country that Sun does business that is more polite or ruder. People ringing help desks are generally on a short fuse and act the same no matter where they are from and its not a good side of human nature ;) This is why I'd really like Sun to make more use of meeting.central and our REALLY COOL name finder phone book because if you look up someone and ring them it knows when to add 70. 1) Globally people are rude when they ring help desks 2) People do not listen to messages on voice mail 3) People do not care if they give out to the wrong person as long as they find some poor sod. 4) I do not want the guy who shares my numbers job :)

Tuesday Sep 27, 2005

OpenSolaris Live on My New laptop

Finally got my Ferrari 4k install with Nevada 23. ( Been using a 3200 for a long time ) It rocks, rocks I tell you. Fast, quite and with the frkit its got everything you'll ever need from an OS.

Wednesday Aug 03, 2005

Sun Again.

I was speaking to an engineer in my team yesterday while getting a tea.
We were discussing road maps ( which I cannot go into here ) for really
cool new hardware coming out of Sun both SPARC and AMD over
the next little while.

I have been around here a while and was a customer long before that and
expressed my view that these systems design were very "Sun".  he asked
what I meant.

Well the the boxes are simple, pack in a HUGE amount and have a high
build quality ( even for the prototype units we have ). Just to show him I
 put one of our 2u rack mount systems next to a new IBM Xeon 2u system
( yes, we test Solaris x86, Java and of course JES on NON Sun hardware really )
and the difference was amazing. The IBM has got so many additional components
which make it look like a KIT built from spare bits and I'm just talking the
packaging.

I'd love to post pictures but you'll have to wait a while longer to try one
for yourself,  even  with the system packaging we're back to where we
started putting standard bits together better than anyone ;) if you run a datacenter with 100's of rack mount unit you'll LOVE these.

P.S. The IBM runs Solaris x86 just great. J2SE runs fine on it with XP
( yes we test Java on XP ), and RH & SuSE too.

Wednesday Jun 29, 2005

Sun and U2

fred

I have worked in Sun for a long time now ( 12+ years ) and taught
I had seen it all !  But never did I expect to see a PRESS RELEASE with
Sun Microsystems and Bono ( yes of U2 fame ) working together !

I was lucky enough to get a ticket to see the Vertigo Tour
home coming opening night in Dublin's Croke Park.  They sold out 3 nights
and could sell 10 more if the venue was available.

The concert was AWESOME. great music a super show and
the band clearly enjoyed playing to the home crowd. The high tech
light show was incredible.

At the end of the concert Bono asked the crowd to text the word 'AFRICA' to 53131.
and that is where Sun come in. We provided the back end infra structure
( and I guess java for all those phones  too )
http://www.electricnews.net/news.html?code=9615617

Maybe marketing could get a Sun Logo somewhere in the venues for the rest of the shows.
I had to update this as it now appears that Marketing we're listening :) checkout www.sun.ie and we see Bono in all his glory.
Croke Park is an 80,000  seater stadium close to Sun's office and home
of Gaelic Football and Hurling, check these out if you get the chance
if you visit Ireland they are uniquely Irish and  great to watch. Checkout  http://gaa.ie.


Thursday Jun 02, 2005

Switch Performance.

/tmp/y A couple of my team mates ( Fintan and Sean ) have posts that deal with
Linus deciding that Performance testing is a good idea and it should be done
for Linux. ( I'm sure reading the artical again he'll see it as a Homer Simpson
moment , D ooh ! maybe we should test it !)

Sounds Silly ?

It seems that Linus may be ahead of some folks.  We do a lot, in fact, a hell of
a lot of network performance testing. Last week we blew a low end 100/1000Gb switch,
we replaced it with a new one, same make, model etc. yet there is a 10 % difference
between it and the original on standard SPECweb99 benchmark. Ouch.

The same hardware now gets 10% less. maybe switch vendors could start testing
their firmware too :)

Tuesday May 24, 2005

I'll never live this down


I have been given the honor of being part of Sun's 2nd "Change Agents" poster campaign. While I greatfull for this honor ( I'll get you Darrin if it takes years ) my choice as a "poster boy" is rather strange as I'm far from Brad Pitt. I reckon that all the good looking folks were used first time round.

My team mates have, showing their usual level of respect honored me further by creating their own version of the poster including a rather nice picture of me in China. ( Brutus I'll get you too. )

My Intro

b

Day two with an Blog I should really introduce myself.

I manage the Performance QA team in Sun. We're in sunny Dublin ( sunny for the
next 10 minutes anyway ) 4 other members of the team have blogs and I finally
given in.  Fintan Ryan has a couple of excellent blogs that describe what we do here
http://blogs.sun.com/roller/page/fintanr/20050426

Basically we champion the "Performance Lifestyle".  Sounds like marketing ?
It's not, it is something that grew from the ground up,  first setting performance
criteria for out of the box performance on each releases, not just focusing on TPC-C
benchmark specials,  then implementing a continuous improvement model until it
becomes natural behavior for developers.

Our role is one of a catalyst. We make it so easy to test performance
properly, effectively and cheaply that engineers do. From Sun's view we can afford
to because instead of 20 imperfect SPECweb2005 rigs that are 20% used we have
10 90% used, better return on capital .  Development teams save time and energy
because the developers write code and do not waste time learning how to configure
benchmarks, find and configure the hardware etc..

The good engineers ( the majority thankfully ) use us

    *     To tune their code
    *     Select the algorithm
    *     Check 2nd order effects on platform they do not have access to.
    *     Run against a wide range of  workloads, one size does not fit all
             "Performance is in the eye of the customer"

That not so good ( mostly those that need training and a few well.....)

    *      Make sure they do not eat into gains made by the good guys
    *      Ensure that poor code is noticed and fixed early
    *      Show them its easy to do it right next time :)

The best
    *    People hand over their resources for us to manage and provide
         a service to all Sun.
    *    People outside our organizations provide us with Millions of
         dollars from their budgets in capital and head count ! putting
          their money where their mouth is. ( Darrin take a bow )
    *    Train us on the latest technology thanks Sunay, Bryan, Brian.

Lastly we ensure that the results are visible to management, so those that
do the right thing are noticed i.e. reward the right behavior.

This is not bums on seats engineering, we're a small focused 10 people team
which provides massive ROI to Sun  and its customer base, you've heard
how much faster Solaris 10 and hopefully even seen it for yourself we're
proud to be part of that.

More important if you have cases where Solaris 10 is slower please
drop me a line and we'll try to add your code to the expanding test metric
of over 100+ benchmarks, many of which came from customers.

Interested in developing OpenSoalris ? We'll be there to help you too :)
on http://opensolaris.org





Monday May 23, 2005

access woes


I had the displeasure of breaking Solaris 10 on my Acer Ferrari 3200
( not its fault tried to BFU to the latest nevada build with a Solaris 10
BFU script ) so I could not use our ip sec based remote secure access service.

For those that like me choose to shoot themselves in the foot from time
to time I suggest you use frkit to upgrade your systems. See just how
good Solaris performs on a laptop, opensolaris.org has or will have shortly
frkit it saves you from yourself.

This meant I needed to go looking for my DES card to use VPN, Of course
I look high and low with no joy, then I see it looking at me thru the
window of the washer !

I have learned 3 important things from this.

1) I do not like windows, nothing religious just Solaris 10
   is better, punchin is much better than VPN !
2) My 4 year old daughter can reach about 4 inches higher than
   I taught ( which is how the DES card ended up in the washer )
3) The DES card worked fine after it dried out ;)