Just some links I've stumbled across on my odyssey with VirtualBox:
Here's my script. No, I didn't use SMF. I'm old school. Bite me.
I'll add more as I collect them:
One of the difficulties with pushing out a release for OpenSolaris Project: NFS version 4.1 pNFS is that we could only release source and BFU. We could not release a live image.
To complicate matters, part of the NFS code is in the closed repository. The impact of which was we had to also release a special closed-bins.
The difficulty lay in two areas:
We started to get requests for VMWare images. And we still weren't allowed to hand those out.
But OpenSolaris is adaptive to pressures in the community. I just asked again and was pointed towards the Hadoop project and especially this one: OpenSolaris Project: Hadoop Live CD.
My understanding is that we aren't trying to make a distribution, we aren't trying to steal thunder, instead we are trying to get systems out there to enable interoperability testing.
So now I'm working on a framework to get OpenSolaris + pNFS on a VirtualBox image.
Stay tuned as I go down the wrong path several times, but emerge with a working process.
Okay, I've got a brand new Sun Fire X4150 Server and it is geeked out with processors and memory. When I installed SUNWonbld, it said that I should use 36 for dmake concurrency. So, let's set our .make.files and let a build rip.
I'm going to modify usr/src/tools/env/developer.sh with the following:
[th199096@jhereg spe-build]> diff nightly.env $SRC/tools/env/developer.sh 41c41 < NIGHTLY_OPTIONS="-aFCDlmprn"; export NIGHTLY_OPTIONS --- > NIGHTLY_OPTIONS="-aCDlmpr"; export NIGHTLY_OPTIONS 194d193 < export CW_NO_SHADOW=1
I cut out the $STAFFER and such. The main differences are that I am not doing the gcc shadow building and I am not doing a non-DEBUG build. This should blaze, but it doesn't:
==== Nightly distributed build started: Fri Sep 26 21:09:37 CDT 2008 ==== ==== Nightly distributed build completed: Fri Sep 26 22:17:58 CDT 2008 ==== ==== Total build time ==== real 1:08:21 ... /opt/SUNWspro/bin/dmake dmake: Sun Distributed Make 7.7 2005/10/13 number of concurrent jobs = 4
Okay, I just took the hit to Studio 12, so maybe there is a bit more time for that. And I think I have everything local, but perhaps I am hitting the network. But lets focus on dmake telling me it will be using 4 concurrent jobs. That is by no stretch 36.
[th199096@jhereg spe-build]> grep jhereg ~/.make.machines jhereg max=36 jhereg.central.sun.com max=36
I invoke the build like this:
[th199096@jhereg spe-build]> printenv | grep DMAKE DMAKE_MODE=parallel DMAKE_MAX_JOBS=36 [th199096@jhereg spe-build]> env -i `which nightly` nightly.env Time spent in user mode (CPU seconds) : 10608.23s Time spent in kernel mode (CPU seconds) : 6272.44s Total time : 1:08:21.75s CPU utilisation (percentage) : 411.5%
I use env -i because someone told me that it makes sure I have just the right things in my environment. How can I tell that I'm getting the right number?
I can copy `which nightly` and hack it to just report the dmake concurrency.
[th199096@jhereg spe-build]> env -i ./nightly.tst -i nightly.env Testing DMAKE, quick exit number of concurrent jobs = 4
Okay, pretty clear that I am only getting 4, but why? Add some more debugging in the main DMAKE procesisng code:
hostname=`uname -n`
if [ ! -f $HOME/.make.machines ]; then
echo "No $HOME/.make.machines found!"
DMAKE_MAX_JOBS=4
else
echo "Grepping for $HOST in $HOME/.make.machines"
DMAKE_MAX_JOBS="`grep $hostname $HOME/.make.machines | \
tail -1 | awk -F= '{print $ 2;}'`"
if [ "$DMAKE_MAX_JOBS" = "" ]; then
echo "Nothing in that file!"
DMAKE_MAX_JOBS=4
fi
fi
DMAKE_MODE=parallel;
export DMAKE_MODE
export DMAKE_MAX_JOBS
And run it:
[th199096@jhereg spe-build]> env -i ./nightly.tst -i nightly.env Grepping for jhereg in /.make.machines Nothing in that file! Testing DMAKE, quick exit number of concurrent jobs = 4
Hey, why is it looking in /.make.machines and not in my homedir?1
[th199096@jhereg spe-build]> echo $HOME /home/th199096 [th199096@jhereg spe-build]> more home.tst #!/bin/ksh -p # echo "My home is $HOME" [th199096@jhereg spe-build]> env -i ./home.tst My home is [th199096@jhereg spe-build]> ./nightly.tst -i nightly.env Grepping for jhereg in /home/th199096/.make.machines Testing DMAKE, quick exit number of concurrent jobs = 36
Okay, env is hosing me.
[th199096@jhereg spe-build]> env -i HOME=/home/th199096 ./home.tst My home is /home/th199096
And crap, env spells it out for me:
OPTIONS
The following options are supported:
-i | - Ignores the environment that would otherwise
be inherited from the current shell. Res-
tricts the environment for utility to that
specified by the arguments.
So, another quick test:
[th199096@jhereg spe-build]> env ./home.tst My home is /home/th199096
I know I was told to invoke my builds this way to speed them up - i.e., to grab the correct paths. I also know I've been battling this $HOME issue the whole time.
I wonder how long the build will take now?
[th199096@jhereg th199096]> zfs clone pool/builds/th199096/spe-gate@fresh pool/builds/th199096/spe-build2 [th199096@jhereg th199096]> ws spe-build2 Workspace : /builds/th199096/spe-build2 Workspace Parent : ssh://aus1500-home//pool/ws/th199096/spe-gate Proto area ($ROOT) : /builds/th199096/spe-build2/proto/root_i386 Root of source ($SRC) : /builds/th199096/spe-build2/usr/src Root of test source ($TSRC) : /builds/th199096/spe-build2/usr/ontest Current directory ($PWD) : /builds/th199096/spe-build2 [th199096@jhereg spe-build2]> cp ../spe-build/nightly.env . [th199096@jhereg spe-build2]> vi nightly.env [th199096@jhereg spe-build2]> rm ../spe-build/nightly.tst [th199096@jhereg spe-build2]> `which nightly` nightly.env
Yeah, zfs clone is sweet for rapid testing of a baseline!
And we get such a big savings, not!
[th199096@jhereg spe-build2]> `which nightly` nightly.env Time spent in user mode (CPU seconds) : 10624.32s Time spent in kernel mode (CPU seconds) : 7579.56s Total time : 1:04:35.29s CPU utilisation (percentage) : 469.7%
The concurrency was correct:
/opt/SUNWspro/bin/dmake dmake: Sun Distributed Make 7.7 2005/10/13 number of concurrent jobs = 36
All of the important tools are local:
[th199096@jhereg spe-build2]> df -h /opt/SUNWspro/bin/dmake /opt/onbld/bin/nightly /opt/onbld/bin/i386/cw /usr/java/bin/javac /usr/ccs/bin/as Filesystem size used avail capacity Mounted on pool/tools 134G 9.6G 38G 21% /pool/tools pool/tools 134G 9.6G 38G 21% /pool/tools pool/tools 134G 9.6G 38G 21% /pool/tools /dev/dsk/c0t0d0s0 44G 11G 32G 27% / /dev/dsk/c0t0d0s0 44G 11G 32G 27% /
Ok, the next thing will be to check if there is a difference between working with a clone (which has to copy-on-write) and a fresh dataset.
[th199096@jhereg spe-build3]> `which nightly` nightly.env Time spent in user mode (CPU seconds) : 10634.44s Time spent in kernel mode (CPU seconds) : 9678.18s Total time : 1:08:42.11s CPU utilisation (percentage) : 492.7%
No. I'll have to think on this. The other option available is to reimage the system with all 3 disks in the pool:
[root@jhereg ~]> zpool iostat -v
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
pool 96.9G 39.1G 1 6 61.4K 448K
c0t1d0 48.5G 19.5G 0 3 30.8K 224K
c0t2d0 48.5G 19.5G 0 3 30.5K 224K
---------- ----- ----- ----- ----- ----- -----
Not sure how much one more spindle will reduce the build.2.
Okay, last test is to remove the following options:
And these yield the biggest savings to date3:
[th199096@jhereg spe-build4]> `which nightly` nightly.env Time spent in user mode (CPU seconds) : 7993.05s Time spent in kernel mode (CPU seconds) : 4818.01s Total time : 45:44.46s CPU utilisation (percentage) : 466.7%The hole we as developers tend to fall into is to want to rebuild everything. We don't always need to rebuild the BFU archives if we are just changing a kernel module. At the BAT, I was rebuilding just the nfs or nfssrv modules and scp'ing them over (I might have hosed NFS don't ya know). My "build" times were in the matter of seconds. I spent more time moving the mouse and worrying about whether or not I had changed a header which needed to be installed in my proto area.
And in the end, before I can integrate my changes, I'll need to be lint and cstyle clean, I'll need to build non-DEBUG versions, and I'll need to build for sparc. And I'll need to retest then.
I started off with a moral about questioning advice given to you versus actual experience, but it turns out the increase in dmake concurrency didn't really help, now did it?
Notes
/.make.machine
Going back, I wondered why my test did not complain about not finding /.make.machine:
[root@jhereg scripts]> ls -la /.make.machines lrwxrwxrwx 1 root other 27 Sep 26 12:32 /.make.machines -> opt/onbld/gk/.make.machines [root@jhereg scripts]> more !$ more /.make.machines elpaso max=20So there is a default installed by SUNWonbld.
Broken disk?
Hey, wait, don't I really have four disks and not three?
[th199096@jhereg th199096]> iostat tty sd0 sd1 sd2 sd3 cpu tin tout kps tps serv kps tps serv kps tps serv kps tps serv us sy wt id 0 113 0 0 0 66 2 40 304 5 28 303 5 27 3 3 0 93 # format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t0d0/pci@0,0/pci8086,25f8@4/pci1000,3150@0/sd@0,0 1. c0t1d0 /pci@0,0/pci8086,25f8@4/pci1000,3150@0/sd@1,0 2. c0t2d0 /pci@0,0/pci8086,25f8@4/pci1000,3150@0/sd@2,0 I saw some message before the last jumpstart about taking some disk offline. And I've never really seen jhereg. It is in a lab in Austin.
Okaay, that missing disk is the DVD drive: :->
[root@jhereg ~]> iostat -En c1t0d0 Soft Errors: 0 Hard Errors: 11 Transport Errors: 6 Vendor: TSSTcorp Product: CD/DVDW TS-T632A Revision: SR03 Serial No: Size: 0.00GB <0 bytes> Media Error: 0 Device Not Ready: 11 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: HITACHI Product: H101473SCSUN72G Revision: SA25 Serial No: 0827DAELAA Size: 73.41GB <73407865856 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: HITACHI Product: H101473SCSUN72G Revision: SA25 Serial No: 0827DAG92A Size: 73.41GB <73407865856 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: HITACHI Product: H101473SCSUN72G Revision: SA25 Serial No: 0827DA6AWA Size: 73.41GB <73407865856 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0Groan, I messed up the 4th build
I got my mail message for that fast lint build and it stated that build3 had finished. I got the wrong directory! I had copied over the nightly.env, fixed the path, and then made an error. So I copied the file over again. Except this time I forgot to change the path!
So the savings may have been false. Another build has been kicked off!
[th199096@jhereg spe-build4]> `which nightly` nightly.env Time spent in user mode (CPU seconds) : 7965.57s Time spent in kernel mode (CPU seconds) : 4818.72s Total time : 46:52.02s CPU utilisation (percentage) : 454.6%So the savings were real.