« September 2008 »
SunMonTueWedThuFriSat
 
1
2
3
4
5
6
7
8
9
10
11
13
14
15
18
24
    
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20080927 Saturday September 27, 2008
Bag of links about Virtual Box configurations

Just some links I've stumbled across on my odyssey with VirtualBox:

Solaris Cluster on a laptop using VirtualBox, iSCSI and a quorum server
VirtualBox 1.6.2 configuration with jumpstarting done via JET
Building a Solaris Cluster Express cluster in a VirtualBox on OpenSolaris
VirtualBox 1.6.2 configuration to build a cluster and using iSCSI from ZFS
Host Interface Networking in Sun xVM VirtualBox
VirtualBox 2.0.0 configuration of Host Interface Networking. Actually installing OpenSolaris on Ubuntu
VirtualBox meets JET...
Must be Virtual Box 1.6.* -- using JET and flar to manage quick setup of VMs. Hmm, he installs a OpenSolaris vbox on a Windows machine, puts JET on it, and then uses that machine to jumpstart others. Sweet article!
Configuring host networking for VirtualBox
Again a VirtualBox 1.6.* release, another 4150 with four dual-core CPUs and 8Gb RAM. Use /usr/lib/vna, which appears gone from Virtual Box 2.0.2.
Famous Quote:
Here's my script. No, I didn't use SMF. I'm old school. Bite me.

I'll add more as I collect them:

VirtualBox Buzz
How can any collection of links on VirtualBox be complete without this?

Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
Working towards a vbox image for distribution

One of the difficulties with pushing out a release for OpenSolaris Project: NFS version 4.1 pNFS is that we could only release source and BFU. We could not release a live image.

To complicate matters, part of the NFS code is in the closed repository. The impact of which was we had to also release a special closed-bins.

The difficulty lay in two areas:

  1. We weren't allowed to take the DVD image, install our bits, and send that back out. Note, if you search for kanigix on my blog, you'll see I provide recipes for making your own customized DVD, but I don't distribute DVDs.
  2. People, even ex-Sun employees, didn't want to install a stock system and BFU the updates.

We started to get requests for VMWare images. And we still weren't allowed to hand those out.

But OpenSolaris is adaptive to pressures in the community. I just asked again and was pointed towards the Hadoop project and especially this one: OpenSolaris Project: Hadoop Live CD.

My understanding is that we aren't trying to make a distribution, we aren't trying to steal thunder, instead we are trying to get systems out there to enable interoperability testing.

So now I'm working on a framework to get OpenSolaris + pNFS on a VirtualBox image.

Stay tuned as I go down the wrong path several times, but emerge with a working process.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
Builds are too slow...

Okay, I've got a brand new Sun Fire X4150 Server and it is geeked out with processors and memory. When I installed SUNWonbld, it said that I should use 36 for dmake concurrency. So, let's set our .make.files and let a build rip.

I'm going to modify usr/src/tools/env/developer.sh with the following:

[th199096@jhereg spe-build]> diff nightly.env  $SRC/tools/env/developer.sh
41c41
< NIGHTLY_OPTIONS="-aFCDlmprn";  export NIGHTLY_OPTIONS
---
> NIGHTLY_OPTIONS="-aCDlmpr";           export NIGHTLY_OPTIONS
194d193
< export CW_NO_SHADOW=1

I cut out the $STAFFER and such. The main differences are that I am not doing the gcc shadow building and I am not doing a non-DEBUG build. This should blaze, but it doesn't:

==== Nightly distributed build started:   Fri Sep 26 21:09:37 CDT 2008 ====
==== Nightly distributed build completed: Fri Sep 26 22:17:58 CDT 2008 ====

==== Total build time ====

real    1:08:21
...
/opt/SUNWspro/bin/dmake
dmake: Sun Distributed Make 7.7 2005/10/13
number of concurrent jobs = 4

Okay, I just took the hit to Studio 12, so maybe there is a bit more time for that. And I think I have everything local, but perhaps I am hitting the network. But lets focus on dmake telling me it will be using 4 concurrent jobs. That is by no stretch 36.

[th199096@jhereg spe-build]> grep jhereg ~/.make.machines 
jhereg   max=36
jhereg.central.sun.com   max=36

I invoke the build like this:

[th199096@jhereg spe-build]> printenv | grep DMAKE
DMAKE_MODE=parallel
DMAKE_MAX_JOBS=36
[th199096@jhereg spe-build]> env -i `which nightly` nightly.env
Time spent in user mode   (CPU seconds) : 10608.23s
Time spent in kernel mode (CPU seconds) : 6272.44s
Total time                              : 1:08:21.75s
CPU utilisation (percentage)            : 411.5%

I use env -i because someone told me that it makes sure I have just the right things in my environment. How can I tell that I'm getting the right number?

I can copy `which nightly` and hack it to just report the dmake concurrency.

[th199096@jhereg spe-build]> env -i ./nightly.tst -i nightly.env 
Testing DMAKE, quick exit
number of concurrent jobs = 4

Okay, pretty clear that I am only getting 4, but why? Add some more debugging in the main DMAKE procesisng code:

hostname=`uname -n`
if [ ! -f $HOME/.make.machines ]; then
        echo "No $HOME/.make.machines found!"
        DMAKE_MAX_JOBS=4
else
        echo "Grepping for $HOST in $HOME/.make.machines"
        DMAKE_MAX_JOBS="`grep $hostname $HOME/.make.machines | \
            tail -1 | awk -F= '{print $ 2;}'`"
        if [ "$DMAKE_MAX_JOBS" = "" ]; then
                echo "Nothing in that file!"
                DMAKE_MAX_JOBS=4
        fi
fi
DMAKE_MODE=parallel;
export DMAKE_MODE
export DMAKE_MAX_JOBS

And run it:

[th199096@jhereg spe-build]> env -i ./nightly.tst -i nightly.env
Grepping for jhereg in /.make.machines
Nothing in that file!
Testing DMAKE, quick exit
number of concurrent jobs = 4

Hey, why is it looking in /.make.machines and not in my homedir?1

[th199096@jhereg spe-build]> echo $HOME
/home/th199096
[th199096@jhereg spe-build]> more home.tst 
#!/bin/ksh -p
#

echo "My home is $HOME"
[th199096@jhereg spe-build]> env -i ./home.tst 
My home is 
[th199096@jhereg spe-build]> ./nightly.tst -i nightly.env 
Grepping for jhereg in /home/th199096/.make.machines
Testing DMAKE, quick exit
number of concurrent jobs = 36

Okay, env is hosing me.

[th199096@jhereg spe-build]> env -i HOME=/home/th199096 ./home.tst 
My home is /home/th199096

And crap, env spells it out for me:

OPTIONS
     The following options are supported:

     -i | -        Ignores the environment that  would  otherwise
                   be  inherited  from  the  current shell.  Res-
                   tricts the environment  for  utility  to  that
                   specified by the arguments.

So, another quick test:

[th199096@jhereg spe-build]> env ./home.tst
My home is /home/th199096

I know I was told to invoke my builds this way to speed them up - i.e., to grab the correct paths. I also know I've been battling this $HOME issue the whole time.

I wonder how long the build will take now?

[th199096@jhereg th199096]> zfs clone pool/builds/th199096/spe-gate@fresh pool/builds/th199096/spe-build2
[th199096@jhereg th199096]> ws spe-build2

Workspace                    : /builds/th199096/spe-build2
Workspace Parent             : ssh://aus1500-home//pool/ws/th199096/spe-gate
Proto area ($ROOT)           : /builds/th199096/spe-build2/proto/root_i386
Root of source ($SRC)        : /builds/th199096/spe-build2/usr/src
Root of test source ($TSRC)  : /builds/th199096/spe-build2/usr/ontest
Current directory ($PWD)     : /builds/th199096/spe-build2

[th199096@jhereg spe-build2]> cp ../spe-build/nightly.env  .
[th199096@jhereg spe-build2]> vi nightly.env 
[th199096@jhereg spe-build2]> rm ../spe-build/nightly.tst 
[th199096@jhereg spe-build2]> `which nightly` nightly.env 

Yeah, zfs clone is sweet for rapid testing of a baseline!

And we get such a big savings, not!

[th199096@jhereg spe-build2]> `which nightly` nightly.env 
Time spent in user mode   (CPU seconds) : 10624.32s
Time spent in kernel mode (CPU seconds) : 7579.56s
Total time                              : 1:04:35.29s
CPU utilisation (percentage)            : 469.7%

The concurrency was correct:

/opt/SUNWspro/bin/dmake
dmake: Sun Distributed Make 7.7 2005/10/13
number of concurrent jobs = 36

All of the important tools are local:

[th199096@jhereg spe-build2]> df -h /opt/SUNWspro/bin/dmake /opt/onbld/bin/nightly /opt/onbld/bin/i386/cw /usr/java/bin/javac /usr/ccs/bin/as 
Filesystem             size   used  avail capacity  Mounted on
pool/tools             134G   9.6G    38G    21%    /pool/tools
pool/tools             134G   9.6G    38G    21%    /pool/tools
pool/tools             134G   9.6G    38G    21%    /pool/tools
/dev/dsk/c0t0d0s0       44G    11G    32G    27%    /
/dev/dsk/c0t0d0s0       44G    11G    32G    27%    /

Ok, the next thing will be to check if there is a difference between working with a clone (which has to copy-on-write) and a fresh dataset.

[th199096@jhereg spe-build3]> `which nightly` nightly.env 
Time spent in user mode   (CPU seconds) : 10634.44s
Time spent in kernel mode (CPU seconds) : 9678.18s
Total time                              : 1:08:42.11s
CPU utilisation (percentage)            : 492.7%

No. I'll have to think on this. The other option available is to reimage the system with all 3 disks in the pool:

[root@jhereg ~]> zpool iostat -v
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
pool        96.9G  39.1G      1      6  61.4K   448K
  c0t1d0    48.5G  19.5G      0      3  30.8K   224K
  c0t2d0    48.5G  19.5G      0      3  30.5K   224K
----------  -----  -----  -----  -----  -----  -----

Not sure how much one more spindle will reduce the build.2.

Okay, last test is to remove the following options:


And these yield the biggest savings to date3:

[th199096@jhereg spe-build4]> `which nightly` nightly.env 
Time spent in user mode   (CPU seconds) : 7993.05s
Time spent in kernel mode (CPU seconds) : 4818.01s
Total time                              : 45:44.46s
CPU utilisation (percentage)            : 466.7%

The hole we as developers tend to fall into is to want to rebuild everything. We don't always need to rebuild the BFU archives if we are just changing a kernel module. At the BAT, I was rebuilding just the nfs or nfssrv modules and scp'ing them over (I might have hosed NFS don't ya know). My "build" times were in the matter of seconds. I spent more time moving the mouse and worrying about whether or not I had changed a header which needed to be installed in my proto area.

And in the end, before I can integrate my changes, I'll need to be lint and cstyle clean, I'll need to build non-DEBUG versions, and I'll need to build for sparc. And I'll need to retest then.

I started off with a moral about questioning advice given to you versus actual experience, but it turns out the increase in dmake concurrency didn't really help, now did it?

Notes

/.make.machine

Going back, I wondered why my test did not complain about not finding /.make.machine:

[root@jhereg scripts]> ls -la /.make.machines 
lrwxrwxrwx   1 root     other         27 Sep 26 12:32 /.make.machines -> opt/onbld/gk/.make.machines
[root@jhereg scripts]> more !$
more /.make.machines
elpaso max=20

So there is a default installed by SUNWonbld.

Broken disk?

Hey, wait, don't I really have four disks and not three?

[th199096@jhereg th199096]> iostat
   tty        sd0           sd1           sd2           sd3            cpu
 tin tout kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
   0  113   0   0    0   66   2   40  304   5   28  303   5   27    3  3  0 93
# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0t0d0 
          /pci@0,0/pci8086,25f8@4/pci1000,3150@0/sd@0,0
       1. c0t1d0 
          /pci@0,0/pci8086,25f8@4/pci1000,3150@0/sd@1,0
       2. c0t2d0 
          /pci@0,0/pci8086,25f8@4/pci1000,3150@0/sd@2,0

I saw some message before the last jumpstart about taking some disk offline. And I've never really seen jhereg. It is in a lab in Austin.

Okaay, that missing disk is the DVD drive: :->

[root@jhereg ~]> iostat -En
c1t0d0           Soft Errors: 0 Hard Errors: 11 Transport Errors: 6 
Vendor: TSSTcorp Product: CD/DVDW TS-T632A Revision: SR03 Serial No:  
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 11 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c0t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: HITACHI  Product: H101473SCSUN72G  Revision: SA25 Serial No: 0827DAELAA 
Size: 73.41GB <73407865856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c0t1d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: HITACHI  Product: H101473SCSUN72G  Revision: SA25 Serial No: 0827DAG92A 
Size: 73.41GB <73407865856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c0t2d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: HITACHI  Product: H101473SCSUN72G  Revision: SA25 Serial No: 0827DA6AWA 
Size: 73.41GB <73407865856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 

Groan, I messed up the 4th build

I got my mail message for that fast lint build and it stated that build3 had finished. I got the wrong directory! I had copied over the nightly.env, fixed the path, and then made an error. So I copied the file over again. Except this time I forgot to change the path!

So the savings may have been false. Another build has been kicked off!

[th199096@jhereg spe-build4]> `which nightly` nightly.env
Time spent in user mode   (CPU seconds) : 7965.57s
Time spent in kernel mode (CPU seconds) : 4818.72s
Total time                              : 46:52.02s
CPU utilisation (percentage)            : 454.6%

So the savings were real.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily