« September 2008 »
SunMonTueWedThuFriSat
 
1
2
3
4
5
6
7
8
9
10
11
13
14
15
18
24
    
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20080930 Tuesday September 30, 2008
Build turd issues you may not see in a Flag Day or a heads up...

Sometimes when you do an incremental build, you run into some fluff that kills your build:

==== cpio archives build errors (DEBUG) ====

Failed to create generic kernel archive:	200550 blocks
cpiotranslate: kernel/misc/amd64/sysinit: no packaging info
cpiotranslate: kernel/misc/sysinit: no packaging info

And everyone is supposed to know how to handle these:

[th199096@aus-build-x86 mms]> ls -al proto/root_i386/kernel/misc/amd64/sysinit
-rwxr-xr-x   1 th199096 staff       4200 Sep 25 21:22 proto/root_i386/kernel/misc/amd64/sysinit
[th199096@aus-build-x86 mms]> rm proto/root_i386/kernel/misc/amd64/sysinit proto/root_i386/kernel/misc/sysinit
[th199096@aus-build-x86 mms]> `which nightly` -in nightly.env

Hey, what do you know, Ken Erickson did have a Flag Day for those who maintain private copies of bfu; Heads up for everyone else], but it still does not mention cleaning up the turd on your own.

After all, everyone knows how to deal with these turds.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
Why you should do basic triage before sending out a plea for help

So I think I have found a bug in our implementation of Mercurial. I'm all excited and I'm about to send the following email off to the ON gatekeepers:

Take a zfs clone of a workspace which has ssh://aus1500-home//pool/ws/th199096/spe-gate as a parent.

[th199096@jhereg ~]> zfs clone pool/builds/th199096/spe-gate@postjimw pool/builds/th199096/spe-fix
[th199096@jhereg ~]> ws /builds/th199096/spe-fix

Workspace			: /builds/th199096/spe-fix
Workspace Parent		: ssh://aus1500-home//pool/ws/th199096/spe-gate
Proto area ($ROOT)		: /builds/th199096/spe-fix/proto/root_i386
Root of source ($SRC)		: /builds/th199096/spe-fix/usr/src
Root of test source ($TSRC)  : /builds/th199096/spe-fix/usr/ontest
Current directory ($PWD)	: /builds/th199096/spe-fix

Reparent to ssh://aus1500-home//pool/ws/nfs41-clone

[th199096@jhereg spe-fix]> hg reparent ssh://aus1500-home//pool/ws/nfs41-clone
[th199096@jhereg spe-fix]> ws usr/closed/

Workspace			: /builds/th199096/spe-fix/usr/closed
Workspace Parent		: ssh://aus1500-home//pool/ws/th199096/spe-gate/usr/closed
Proto area ($ROOT)		: /builds/th199096/spe-fix/usr/closed/proto/root_i386
Root of source ($SRC)		: /builds/th199096/spe-fix/usr/closed/usr/src
Root of test source ($TSRC)  : /builds/th199096/spe-fix/usr/closed/usr/ontest
Current directory ($PWD)	: /builds/th199096/spe-fix/usr/closed

[th199096@jhereg closed]>  hg reparent ssh://aus1500-home//pool/ws/nfs41-clone/usr/closed
[th199096@jhereg closed]> exit
exit
[th199096@jhereg spe-fix]> hg list
added:
      	usr/src/cmd/fs.d/nfs/sped/Makefile
	usr/src/cmd/fs.d/nfs/sped/sped.c
	usr/src/cmd/fs.d/nfs/sped/sped_dt.d
	usr/src/cmd/fs.d/nfs/sped/sped_server.c
	usr/src/cmd/fs.d/nfs/sped/spedaemon.c
	usr/src/cmd/fs.d/nfs/sped/spedaemon.h
	usr/src/cmd/fs.d/nfs/svc/spe.xml
	usr/src/head/rpcsvc/spe_prot.x
	usr/src/uts/common/fs/nfs/spe.c
	usr/src/uts/common/nfs/spe.h
	usr/src/uts/common/nfs/spe_attr.h
	usr/src/uts/common/nfs/spe_impl.h
modified:
        usr/src/cmd/fs.d/nfs/Makefile
	usr/src/cmd/fs.d/nfs/svc/Makefile
	usr/src/head/Makefile
	usr/src/head/rpcsvc/daemon_utils.h
	usr/src/lib/libshare/nfs/libshare_nfs.h
	usr/src/pkgdefs/SUNWhea/prototype_com
	usr/src/pkgdefs/SUNWhea/prototype_com
	usr/src/pkgdefs/SUNWnfscr/prototype_com
	usr/src/pkgdefs/SUNWnfscu/prototype_com
	usr/src/uts/common/Makefile
	usr/src/uts/common/Makefile.files
	usr/src/uts/common/dserv/dserv_mds.c
	usr/src/uts/common/fs/Makefile
	usr/src/uts/common/fs/nfs/ds_srv.c
	usr/src/uts/common/fs/nfs/nfs41_srv.c
	usr/src/uts/common/fs/nfs/nfs41_state.c
	usr/src/uts/common/fs/nfs/nfs_sys.c
	usr/src/uts/common/nfs/Makefile
	usr/src/uts/common/nfs/mds_state.h
	usr/src/uts/common/nfs/nfs4.h
	usr/src/uts/common/nfs/nfssys.h
	usr/src/uts/intel/nfs/Makefile
	usr/src/uts/sparc/nfs/Makefile
Time spent in user mode	  (CPU seconds) : 3.61s
Time spent in kernel mode (CPU seconds) : 5.44s
Total time				: 0:27.46s
CPU utilisation (percentage)		: 32.9%

End up making changes to usr/src/head/rpcsvc/ds_prot.x, usr/src/uts/common/fs/nfs/ds_srv.c, usr/src/uts/common/fs/nfs/nfs41_state.c, and usr/src/uts/common/nfs/nfs_serv_inst.h

[th199096@jhereg spe-fix]> vi usr/src/head/rpcsvc/ds_prot.x
[th199096@jhereg spe-fix]> vi usr/src/uts/common/dserv/dserv_mds.c
[th199096@jhereg spe-fix]> vi usr/src/uts/common/fs/nfs/ds_srv.c
[th199096@jhereg spe-fix]> vi usr/src/uts/common/fs/nfs/nfs41_srv.c
[th199096@jhereg spe-fix]> vi usr/src/uts/common/fs/nfs/nfs41_state.c
[th199096@jhereg spe-fix]> vi usr/src/uts/common/fs/nfs/nfs41_state.c
[th199096@jhereg spe-fix]> pushd usr/src/uts/common/fs/nfs
/builds/th199096/spe-fix/usr/src/uts/common/fs/nfs /builds/th199096/spe-fix
[th199096@jhereg nfs]> grep ds_guid_info_idx *
ds_srv.c:	    mds_server->ds_guid_info_idx,
nfs41_state.c:rfs4_index_t *ds_guid_info_idx;
nfs41_state.c:	instp->ds_guid_info_idx = rfs4_index_create(instp->ds_guid_info_tab,
[th199096@jhereg nfs]> vi ds_srv.c
[th199096@jhereg nfs]> popd
/builds/th199096/spe-fix
[th199096@jhereg spe-fix]> vi usr/src/uts/common/fs/nfs/nfs41_state.c
[th199096@jhereg spe-fix]> vi usr/src/uts/common/fs/nfs/nfs41_state.c\
?
[th199096@jhereg spe-fix]> vi usr/src/uts/common/nfs/nfs_serv_inst.h

Reparent back to ssh://aus1500-home//pool/ws/th199096/spe-gate

[th199096@jhereg spe-fix]> hg reparent ssh://aus1500-home//pool/ws/th199096/spe-gate
[th199096@jhereg spe-fix]> ws usr/closed/

Workspace			: /builds/th199096/spe-fix/usr/closed
Workspace Parent		: ssh://aus1500-home//pool/ws/nfs41-clone/usr/closed
Proto area ($ROOT)		: /builds/th199096/spe-fix/usr/closed/proto/root_i386
Root of source ($SRC)		: /builds/th199096/spe-fix/usr/closed/usr/src
Root of test source ($TSRC)  : /builds/th199096/spe-fix/usr/closed/usr/ontest
Current directory ($PWD)	: /builds/th199096/spe-fix/usr/closed

[th199096@jhereg closed]> hg reparent ssh://aus1500-home//pool/ws/th199096/spe-gate/usr/closed
[th199096@jhereg closed]> exit
exit

And no changes ???

[th199096@jhereg spe-fix]> hg outgoing
comparing with ssh://aus1500-home//pool/ws/th199096/spe-gate
searching for changes
no changes found
[th199096@jhereg spe-fix]> hg push
pushing to ssh://aus1500-home//pool/ws/th199096/spe-gate
searching for changes
no changes found
[th199096@jhereg spe-fix]>

Up until I've just been cutting and pasting what had happened, now I start to debug to show that I'm not just emailing without trying anything:

Hmm, I happen to have a copy of that gate on this machine:

[th199096@jhereg spe-fix]> diff usr/src/head/rpcsvc/ds_prot.x ../spe-gate/usr/src/head/rpcsvc/ds_prot.x
255d254
<	utf8string	ds_path;
[th199096@jhereg spe-fix]> diff usr/src/uts/common/dserv/dserv_mds.c ../spe-gate/usr/src/uts/common/dserv/dserv_mds.c
[th199096@jhereg spe-fix]> diff usr/src/uts/common/fs/nfs/ds_srv.c ../spe-gate/usr/src/uts/common/fs/nfs/ds_srv.c
139,141d138
<			UTF8STRING_FREE(res_ok->guid_map.guid_map_val[i].
<                           ds_path);
<
711,713d707
<			(void) utf8_copy(&pip->ds_path,
<                           &guid_map[count].ds_path);
<
[th199096@jhereg spe-fix]> diff usr/src/uts/common/fs/nfs/nfs41_srv.c ../spe-gate/usr/src/uts/common/fs/nfs/nfs41_srv.c
[th199096@jhereg spe-fix]> diff usr/src/uts/common/fs/nfs/nfs41_state.c ../spe-gate/usr/src/uts/common/fs/nfs/nfs41_state.c
1919c1919
<  * this will populate the following MDS tables.
---
>  * this will populste the following MDS tables.
2016c2016
<	ds_guid_info_t		*pip;
---
>	mds_pool_info_t		*pip;
2022c2022
<	rw_enter(&ds_guid_info_lock, RW_READER);
---
>	rw_enter(&mds_pool_info_lock, RW_READER);
2024c2024
<	pip = (ds_guid_info_t *)rfs4_dbsearch(ds_guid_info_path_idx,
---
>	pip = (mds_pool_info_t *)rfs4_dbsearch(mds_pool_info_path_idx,
2031c2031
<	rw_exit(&ds_guid_info_lock);
---
>	rw_exit(&mds_pool_info_lock);
2119c2119
< ds_guid_info_path_compare(rfs4_entry_t entry, void *key)
---
> mds_pinfo_path_compare(rfs4_entry_t entry, void *key)
2121c2121
<	ds_guid_info_t *pip = (ds_guid_info_t *)entry;
---
>	mds_pool_info_t *pip = (mds_pool_info_t *)entry;
2127c2127
< ds_guid_info_path_mkkey(rfs4_entry_t entry)
---
> mds_pinfo_path_mkkey(rfs4_entry_t entry)
2129c2129
<	ds_guid_info_t *pip = (ds_guid_info_t *)entry;
---
>	mds_pool_info_t *pip = (mds_pool_info_t *)entry;
2433,2438d2432
<	instp->ds_guid_info_path_idx =
<	    rfs4_index_create(instp->ds_guid_info_tab,
<	    "DS_guid_path-idx", ds_guid_info_hash, ds_guid_info_path_compare,
<	    ds_guid_info_path_mkkey,
<	    TRUE);
<
[th199096@jhereg spe-fix]> diff usr/src/uts/common/nfs/nfs_serv_inst.h ../spe-gate/usr/src/uts/common/nfs/nfs_serv_inst.h
193d192
<	rfs4_index_t *ds_guid_info_path_idx;
[th199096@jhereg spe-fix]>

So the files are different

[th199096@jhereg spe-fix]> diff usr/src/uts/common/nfs/nfs_serv_inst.h /net/aus1500-home/pool/ws/th199096/spe-gate/usr/src/uts/common/nfs/nfs_serv_inst.h
193d192<	rfs4_index_t *ds_guid_info_path_idx;

Yes, really they are

Why doesn't Mercurial think so?

It knows that the files have changed

[th199096@jhereg spe-fix]> hg list
modified:
        usr/src/head/rpcsvc/ds_prot.x
	usr/src/uts/common/fs/nfs/ds_srv.c
	usr/src/uts/common/fs/nfs/nfs41_state.c
	usr/src/uts/common/nfs/nfs_serv_inst.h

[th199096@jhereg spe-fix]> hg outgoing
comparing with ssh://aus1500-home//pool/ws/th199096/spe-gate
searching for changes
no changes found

And here the email stops as I RTFM and realize I forgot a step:

[th199096@jhereg spe-fix]> hg commit
[th199096@jhereg spe-fix]> hg outgoing
comparing with ssh://aus1500-home//pool/ws/th199096/spe-gate
searching for changes
changeset:   7779:ced6eccb4366
tag:		tip
user:		Thomas Haynes 
date:		Tue Sep 30 14:57:46 2008 -0500
summary:	Fix up for new NFS instances

Without a hg commit, even with changes, there is nothing to integrate back to the parent.

So, instead of an email, I blog about it...

[th199096@jhereg spe-fix]> hg push
pushing to ssh://aus1500-home//pool/ws/th199096/spe-gate
searching for changes
Are you sure you wish to push? [y/N]: y
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 4 changes to 4 files

Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
branch gatekeeping 101

So I maintain nfs41-gate which is a development branch of onnv-gate. With the introduction of Mercurial as our version control system, my life has changed a bit, but the basic tasks for gatekeeping stay the same:

Build binaries
When developers change the source base, build new BFU bits for QA and/or other developers. I.e., reference bits without anyone else's changes present.
Branch merge with the ON gate
When something we want is introduced into ON or we don't want to drift too far, we sync up with onnv-gate.

I've noticed that I find merging to be easier with Mercurial, so it tends to happen more often.

Building Binaries

I typically already have an existing workspace and I'll do an incremental build in it. Also, I do this for both sparc and i386. I don't have to worry about conflicts or merging since nothing changes in the child. A typical session would be:

[th199096@aus-build-x86 ~]> ws /builds/th199096/nfs41-gk

Workspace                    : /builds/th199096/nfs41-gk
Workspace Parent             : ssh://aus1500-home//pool/ws/nfs41-clone
Proto area ($ROOT)           : /builds/th199096/nfs41-gk/proto/root_i386
Root of source ($SRC)        : /builds/th199096/nfs41-gk/usr/src
Root of test source ($TSRC)  : /builds/th199096/nfs41-gk/usr/ontest
Current directory ($PWD)     : /builds/th199096/nfs41-gk

[th199096@aus-build-x86 nfs41-gk]>  hg pull -u
pulling from ssh://aus1500-home//pool/ws/nfs41-clone
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 93 changes to 93 files
93 files updated, 0 files merged, 0 files removed, 0 files unresolved
[th199096@aus-build-x86 nfs41-gk]> ws usr/closed/

Workspace                    : /builds/th199096/nfs41-gk/usr/closed
Workspace Parent             : ssh://aus1500-home//pool/ws/nfs41-clone/usr/closed
Proto area ($ROOT)           : /builds/th199096/nfs41-gk/usr/closed/proto/root_i386
Root of source ($SRC)        : /builds/th199096/nfs41-gk/usr/closed/usr/src
Root of test source ($TSRC)  : /builds/th199096/nfs41-gk/usr/closed/usr/ontest
Current directory ($PWD)     : /builds/th199096/nfs41-gk/usr/closed

[th199096@aus-build-x86 closed]> hg pull -u
pulling from ssh://aus1500-home//pool/ws/nfs41-clone/usr/closed
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 3 changes to 3 files
3 files updated, 0 files merged, 0 files removed, 0 files unresolved
[th199096@aus-build-x86 closed]> exit
exit
[th199096@aus-build-x86 nfs41-gk]> `which nightly` -in nightly.env 

At which point I go do something else, like blog about what I am doing.

I said typical, except that this change set (which gets rid of auth records in the MDS as a byproduct) happens to have touched something in closed. We rarely seem to make changes there.

Okay, the build is done (remember to check the logs) and in this case it did not fail. So push the BFU bits out and then send out email telling people about the new reference bits.

[th199096@aus-build-x86 nfs41-gk]> ~/gk/hg-push.sh nfs41-gk i386 2008-09-30
ARCH=i386
BASE=/builds/th199096
AUS=/net/aus1500-home.central/pool/ws/nfs41-gate-hg-archives/i386
NIGHTLY=/builds/th199096/nfs41-gk/archives/i386/nightly
NIGHTLY_ND=/builds/th199096/nfs41-gk/archives/i386/nightly-nd
DATER=/builds/th199096/nfs41-gk/archives/i386/2008-09-30
DATER_ND=/builds/th199096/nfs41-gk/archives/i386/2008-09-30-nd
+ mv /builds/th199096/nfs41-gk/archives/i386/nightly /builds/th199096/nfs41-gk/archives/i386/2008-09-30 
+ mv /builds/th199096/nfs41-gk/archives/i386/nightly-nd /builds/th199096/nfs41-gk/archives/i386/2008-09-30-nd 
+ cp -r /builds/th199096/nfs41-gk/archives/i386/2008-09-30 /net/aus1500-home.central/pool/ws/nfs41-gate-hg-archives/i386/2008-09-30 
+ cp -r /builds/th199096/nfs41-gk/archives/i386/2008-09-30-nd /net/aus1500-home.central/pool/ws/nfs41-gate-hg-archives/i386/2008-09-30-nd 
+ rm /net/aus1500-home.central/pool/ws/nfs41-gate-hg-archives/i386/latest /net/aus1500-home.central/pool/ws/nfs41-gate-hg-archives/i386/latest-nd 
+ cd /net/aus1500-home.central/pool/ws/nfs41-gate-hg-archives/i386 
+ ln -s 2008-09-30 latest 
+ ln -s 2008-09-30-nd latest-nd 

Branch merging

This case is a bit more complicated and can be summarized by:

  1. Take a clone of nfs41-clone - call it nfs41-sync
  2. Reparent it to onnv-clone (onnv-gate is write only)
  3. Pull across the changes and merge them.
    This is actually the only difficulty in the entire process
  4. On sparc and x86 build boxes, get clones of nfs41-sync and do full builds.
    Incrementals are not sufficient in this case.
  5. Populate a pNFS community (client, DS, and MDS) with these changes and make sure the Connectathon tests all pass.
    Depending on the scope of the changes and/or the difficulty of the merge, we may skip this item. -- pure judgment call. Also, note that these clones will become the basis for future incremental builds as described in the previous section.
  6. If there have been further integrations to nfs41-gate, reparent to nfs41-clone (which is automatically kept up to date with nfs41-gate), and do the pull/merge cycle until everything is up to date. You may have to rebuild and retest. Much easier with a small group of developers to ask them not to integrate.
  7. ZFS snapshot nfs41-gate to make rolling back the changes easier.
  8. Reparent nfs41-sync to nfs41-gate and integrate the changes.

By not changing the nfs41-gate until the final moment, I can throw everything away if needed. And believe me, as painful as that is, I've done it. Also, note that when I talk about a workspace above, I am also taling about working in parallel with the closed version of it.

But now onto a detailed example:

Get a backup snapshot of the gate:

[th199096@aus1500-home ~]> zfs snapshot pool/ws/nfs41-clone@sync99

Now grab your copy for merging

[th199096@aus1500-home ~]> cd /pool/ws/th199096/
[th199096@aus1500-home th199096]> ~/bin/hg-clone ssh://aus1500-home//pool/ws/nfs41-clone nfs41-syn
c
397b36b5473d
=== clone open tree: ssh://aus1500-home//pool/ws/nfs41-clone ===
requesting all changes
adding changesets
adding manifests
adding file changes
added 7515 changesets with 101024 changes to 51313 files
updating working directory
42507 files updated, 0 files merged, 0 files removed, 0 files unresolved
2a39f20bc20e
=== clone closed tree: ssh://aus1500-home//pool/ws/nfs41-clone/usr/closed ===
requesting all changes
adding changesets
adding manifests
adding file changes
added 968 changesets with 8269 changes to 4389 files
updating working directory
2677 files updated, 0 files merged, 0 files removed, 0 files unresolved

~/bin/hg-clone is a simple script to get both the open and closed versions of the gate.< /p>

And reparent it to onnv-clone

[th199096@aus1500-home th199096]> ws nfs41-sync/

Workspace			: /pool/ws/th199096/nfs41-sync
Workspace Parent		: ssh://aus1500-home//pool/ws/nfs41-clone
Proto area ($ROOT)		: /pool/ws/th199096/nfs41-sync/proto/root_i386
Root of source ($SRC)		: /pool/ws/th199096/nfs41-sync/usr/src
Root of test source ($TSRC)  : /pool/ws/th199096/nfs41-sync/usr/ontest
Current directory ($PWD)	: /pool/ws/th199096/nfs41-sync

[th199096@aus1500-home nfs41-sync]> hg reparent ssh://onnv.eng//export/onnv-clone

Pull and merge

[th199096@aus1500-home nfs41-sync]> hg pull -u
pulling from ssh://onnv.eng//export/onnv-clone
searching for changes
adding changesets
adding manifests
adding file changes
added 210 changesets with 2122 changes to 1808 files (+1 heads)
not updating, since new heads added
(run 'hg heads' to see heads, 'hg merge' to merge)
[th199096@aus1500-home nfs41-sync]> hg merge
merging usr/src/cmd/Makefile
merging usr/src/cmd/zfs/zfs_iter.c
merging usr/src/cmd/zfs/zfs_main.c
merging usr/src/lib/Makefile
merging usr/src/lib/libzfs/common/libzfs.h
merging usr/src/lib/libzfs/common/libzfs_dataset.c
merging usr/src/lib/libzfs/common/mapfile-vers
merging usr/src/pkgdefs/Makefile
merging usr/src/pkgdefs/SUNWcsu/prototype_com
merging usr/src/pkgdefs/SUNWhea/prototype_com
merging usr/src/pkgdefs/etc/exception_list_sparc
merging usr/src/uts/common/Makefile.files
merging usr/src/uts/common/Makefile.rules
merging usr/src/uts/common/fs/zfs/dsl_dataset.c
merging usr/src/uts/common/fs/zfs/zfs_ioctl.c
merging usr/src/uts/common/sys/Makefile
merging usr/src/uts/common/sys/fs/zfs.h
merging usr/src/uts/intel/Makefile.intel.shared
merging usr/src/uts/intel/os/minor_perm
merging usr/src/uts/intel/os/name_to_major
merging usr/src/uts/sparc/Makefile.sparc.shared
merging usr/src/uts/sparc/os/minor_perm
merging usr/src/uts/sparc/os/name_to_major
1785 files updated, 23 files merged, 62 files removed, 0 files unresolved
(branch merge, don't forget to commit)

So, I used filemerge to do any manual editing in the above merge. It is invoked in my .hgrc:

# Merge tool
[merge-patterns]
** = filemerge

[merge-tools]
filemerge.executable = /ws/onnv-tools/teamware/bin/filemerge
filemerge.args = -a $base $local $other $output
filemerge.checkchanged = true
filemerge.gui = true

Then I would commit and repeat the cycle for the closed branch:

[th199096@aus1500-home nfs41-sync]> hg commit
[th199096@aus1500-home nfs41-sync]> ws usr/closed

Workspace			: /pool/ws/th199096/nfs41-sync/usr/closed
Workspace Parent		: ssh://aus1500-home//pool/ws/nfs41-clone/usr/closed
Proto area ($ROOT)		: /pool/ws/th199096/nfs41-sync/usr/closed/proto/root_i386
Root of source ($SRC)		: /pool/ws/th199096/nfs41-sync/usr/closed/usr/src
Root of test source ($TSRC)  : /pool/ws/th199096/nfs41-sync/usr/closed/usr/ontest
Current directory ($PWD)	: /pool/ws/th199096/nfs41-sync/usr/closed

[th199096@aus1500-home closed]> hg reparent ssh://onnv.eng//export/onnv-clone/usr/closed
[th199096@aus1500-home closed]> hg pull -u
pulling from ssh://onnv.eng//export/onnv-clone/usr/closed
searching for changes
adding changesets
adding manifests
adding file changes
added 15 changesets with 145 changes to 137 files (+1 heads)
not updating, since new heads added
(run 'hg heads' to see heads, 'hg merge' to merge)
[th199096@aus1500-home closed]> hg merge
135 files updated, 0 files merged, 10 files removed, 0 files unresolved
(branch merge, don't forget to commit)
[th199096@aus1500-home closed]> hg commit

The next step is a clone/build on one of the build machines. As this looks a lot like the cloning in this section and the build from the prior, I'm going to leave it out.

After the build and verification is done, we prepare the nfs41-gate for the integration. Because of the branch merge comments not being in the approved RTI format, we need to turn off sanity checking for this operation. Note that it is okay to keep on for developers pushing to the gate:

[th199096@aus1500-home ~]> su - nfs4hg
Password:
Sun Microsystems Inc.	SunOS 5.11	snv_92	January 2008
[nfs4hg@aus1500-home ~]> cd /pool/ws/nfs41-gate/usr/closed/.hg
[nfs4hg@aus1500-home .hg]> cp hgrc hgrc.good
[nfs4hg@aus1500-home .hg]> vi hgrc
[nfs4hg@aus1500-home .hg]> diff hgrc hgrc.good
73c73
< #pretxnchangegroup.1 = python:hook.sanity.sanity
---
> pretxnchangegroup.1 = python:hook.sanity.sanity

Reparent to the gate and push

[th199096@aus1500-home closed]> hg reparent ssh://nfs4hg@aus1500-home//pool/ws/nfs41-gate/usr/clos
ed
[th199096@aus1500-home closed]> hg push
pushing to ssh://nfs4hg@aus1500-home//pool/ws/nfs41-gate/usr/closed
searching for changes
Are you sure you wish to push? [y/N]: y
pushing to ssh://nfs4hg@aus1500-home//pool/ws/nfs41-gate/usr/closed
...
remote: Preparing gk email...
remote: ...gk email sent

Fix the .hgrc back to turn on sanity checking and repeat for the open bits.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
Some new bugs filed

So I filed two bugs with my experiences with VirtualBox 2.0.2 last week:

6753569 Virtual Box running causes assertion failed: afd->a_fd[i] == -1
Basically, when I start the VirtualBox up to install a new OS, it provides this core dump.
6753564 Installation of virtual box should update the boot archive
After I installed VirtualBox 2.0.2 and had the above panic, the fact that the boot archive had not been updated dropped me down into maintenance mode.

If you search the Bug Database for 6753564, it returns 6753569. If you try to go there directly with the URL I provided, then you get bug not found. Ahh, the first match happens because I mention 6753564 in other bug. Well, the link exists in 6753569 and I'm sure the bug will be opened up for public viewing sooner or later.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

20080929 Monday September 29, 2008
Really loving that upgrade from snv85 to snv99

I love hacks which make your life easier, but I also love an evolving OS. It used to be to do ssh-agent management, I had the following in my .dtprofile (and I think dt is no longer being invoked):

###
if whence ssh-agent > /dev/null && [[ ${SSH_AGENT_PID:-0} -eq 0 ]]
then
        eval $(ssh-agent) > /dev/null
        trap "kill $SSH_AGENT_PID" EXIT
fi
(xterm -e ssh-add &)
###

I'd get a little X window and have to manually enter my pass phrases every time I rebooted.

I don't know when it was introduced, but we now have a proper keychain manager and I'm loving it.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
Finding that missing box

Reinstall with a graphical install option appears to have configured X for me, I'm now able to login to the headed console. I'll fix that later.

Now to reinstall everything.

Note that I kept the second zpool, so it was easy to import it and quicker to reinstall my stuff.

And sweet, the Sun Ray software just installs and runs! The crowds go wild!

Hmm, the fonts for the icons on the screen look sharp. Those in my terminal window look like they are from the 80s. Yuck!


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
cdrw output looks more userfriendly

At last, a bright spot:

[root@warlock archives]> cdrw -l
Looking for CD devices...
    Node                   Connected Device                Device type
----------------------+--------------------------------+-----------------
 cdrom0               | AOPEN    COM5232/AAH PRO  1.04 | CD Reader/Writer
 cdrom1               | AOPEN    DUW1608/ARR      A04b | CD Reader/Writer

instead of ... ahh, I don't have a capture of it

Much easier to remember which is which now for me...


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
One of our boxes is missing

So my w2100z is fully installed, but not fully functional. I learned some valuable lessons along the way, and I'm not yet done. Things brought back to the surface:

  • Don't forget to edit your menu.lst if your system is headless. I did it for the install media, I should do it for the system.
  • If you break the root mirror, that can leave your menu.lst pointing to the wrong side of the pool. I.e., there are I expect a copy of the boot targets for each side of the mirror. Combine these two lessons and you have a system that looks dead.
  • Just because you think the Sun Ray Server install code rocks and it has always been easy, don't forget to backup your settings.

And the big one, sometimes it is better to go to bed than futz with things just a bit longer.

I've tried both Sun Ray Software 4 Update 3 Beta for Solaris 10 5/08 X86 and Sun Ray Software 4 09/07. In both cases I've got the 26D error. I know that the firmware on the DTU is being changed between releases and the server software knows about the DTU.

I've tried the following things to get this working:

And I left it there with the last one. I'm about to restart. I think that perhaps fixing my menu.lst has caused this issue or I have to face the fact that I upgraded to too modern a build. But since I never had this problem before and I don't have my hands on the prior configurations, I'll try to get it working.

Strike part of that, I do have /etc/dt/config archived off and it shows I never did the Xserver thing:

[th199096@warlock config]> ls -la
total 26
drwxr-xr-x   2 root     other          6 Sep 25 12:41 .
drwxr-xr-x   3 root     other          3 Mar 29  2008 ..
-r--r--r--   1 root     sys         1577 Aug  1  2007 README.SUNWut
lrwxrwxrwx   1 root     root          34 Sep 29 02:58 Xconfig -> /tmp/SUNWut/config/xconfig/Xconfig
-r--r--r--   1 root     root        5868 Mar 29  2008 Xconfig.SUNWut.prototype
lrwxrwxrwx   1 root     root          35 Sep 29 02:58 Xservers -> /tmp/SUNWut/config/xconfig/Xservers

Okay, I fixed that back out and I told grub not to boot to the console. But I didn't tell eeprom(1M):

[root@warlock ~]> eeprom
ata-dma-enabled=1
atapi-cd-dma-enabled=1
ttyb-rts-dtr-off=false
ttyb-ignore-cd=true
ttya-rts-dtr-off=false
ttya-ignore-cd=true
ttyb-mode=9600,8,n,1,-
ttya-mode=9600,8,n,1,-
lba-access-ok=1
prealloc-chunk-size=0x2000
keyboard-layout=US-English
console=ttya
boot-file=bootadm: kernel command on line 64 not recognized.
boot-args=bootadm: kernel command on line 64 not recognized.
[root@warlock ~]> eeprom console=text
[root@warlock ~]>

By the way, if this is horked, so am I. :->

Okay, I'm horked. I have to come up in failsafe mode. Now how do I fix my eeprom? Luckily, is it?, I've had to do this in the past - eeprom hosed on an x86. And it has the sed command I will need because I refuse to learn how to configure my terminal! And I've saved above what the real value should be!

# pwd
/a/boot/solaris
# sed 's/text/ttya/' bootenv.rc > xxx
# diff bootenv.rc xxx
39c39
< setprop console 'text'
---
> setprop console 'ttya'
# cp xxx bootenv.rc
# reboot
Creating boot_archive for /a

So I'm not getting X on the headed headless server (i.e., I've attached a monitor). I get output there until the OS takes over.

What is in my /var/dt/Xerrors?

Fatal server error:
could not open default font 'fixed'
XIO:  fatal IO error 146 (Connection refused) on X server ":2.0"^M
      after 0 requests (0 known processed) with 0 events remaining.^M
failed to set default font path '/usr/openwin/lib/X11/fonts/Type1/,/usr/openwin/lib/X11/fonts/Type1/sun/,/usr/openwin/lib/X11/fonts/F3bitmaps/,/
usr/openwin/lib/X11/fonts/Speedo/,/usr/openwin/lib/X11/fonts/misc/,/usr/openwin/lib/X11/fonts/75dpi/,/usr/openwin/lib/X11/fonts/100dpi/,/usr/ope
nwin/lib/X11/fonts/TrueType'
One of the directories in the list above does not exist
or it does not contain a valid 'fonts.dir' file

Okay, lets take care of that! All of them existed and none had a valid 'fonts.dir' file. And now:

Fatal server error:
could not open default font 'fixed'
XIO:  fatal IO error 146 (Connection refused) on X server ":2.0"^M
      after 0 requests (0 known processed) with 0 events remaining.^M

I'm really coming to suspect X is the thing horked on this system.

Notes

It looks like something, perhaps eeprom touched my menu.lst and added a new and default setting for me:

title Diagnostic Partition
        rootnoverify (hd0,0)
        chainloader +1
#---------- ADDED BY BOOTADM - DO NOT EDIT ----------
title Solaris bootenv rc
findroot pool_rpool
kernel$ /platform/i86pc/kernel/$ISADIR/unix -B console=ttya bootadm: kernel command on line 64 not recognized.
-B bootadm: kernel command on line 64 not recognized.
module$ /platform/i86pc/$ISADIR/boot_archive
#---------------------END BOOTADM--------------------
#BOOTADM RC SAVED DEFAULT: 0

Which yields:

krtld: Unused kernel arguments: `bootadm: kernel command on line 64 not recognized.'.
SunOS Release 5.11 Version snv_99 64-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
NOTICE: mount: not a UFS magic number (0x0)
Cannot mount root on /ramdisk:a fstype ufs

panic[cpu0]/thread=fffffffffbc293a0: vfs_mountroot: cannot mount root

fffffffffbc48dc0 genunix:vfs_mountroot+356 ()
fffffffffbc48df0 genunix:main+e6 ()
fffffffffbc48e00 unix:_locore_start+92 ()

skipping system dump - no dump device configured
SunOS Release 5.11 Version snv_99 64-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: warlock
Reading ZFS config: done.
Mounting ZFS filesystems: (8/8)

I've got the head on it, so I'm going to reinstall and see if I can at least get X working on it.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
Western Digital My Passport not working

While I'm waiting on my installs to finish, I thought I would tinker with the Western Digital My Passport Essential WDMENG1600TN 160GB USB 2.0 I thought I was using for my Time Machine Backup for my Mac Book Air. It had stopped working one day.

I've hooked the USB drive up to several computers and I haven't even seen it appear. I tried another WD USB drive and it was working. Anyway, tonight I finally tried changing out the USB cables from the two and it started working. And the "bad" cable worked with the other drive.

I used the WD tools to reformat it back to factory spec and then I couldn't see it on my MBA. I pulled the unpowered Belkin mini-USB hub and plugged it straight in. Joy!

Speaking of which the 4 port mini-USB hub from Belkin tricked me with the packaging. It looked like it had an external power supply tucked in it, but it was only folded paper. This is the hub which rotates. It is called a Swivel Hub. I won't be taking it on trips now.

Anyway, my Time Machine backup is now proceeding from scratch.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
First reboot after install of w2100z

Okay, so I got this configuration:

# zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rpool    68G  7.18G  60.8G    10%  ONLINE  -
# zpool iostat -v
                 capacity     operations    bandwidth
pool           used  avail   read  write   read  write
------------  -----  -----  -----  -----  -----  -----
rpool         7.18G  60.8G     31     14   814K   528K
  mirror      7.18G  60.8G     31     14   814K   528K
    c1t0d0s0      -      -     12      8   509K   530K
    c1t1d0s0      -      -     13      8   510K   530K
------------  -----  -----  -----  -----  -----  -----

But I don't want a mirror, I want space!

This should work, but it doesn't:

# zpool detach rpool c1t1d0s0
# zpool iostat -v
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rpool       7.18G  60.8G      8      6   367K   383K
  c1t0d0s0  7.18G  60.8G      8      6   367K   383K
----------  -----  -----  -----  -----  -----  -----

# zpool add rpool c1t1d0s0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c1t1d0s0 overlaps with /dev/dsk/c1t1d0s2
# zpool add -f rpool c1t1d0s0
cannot add to 'rpool': root pool can not have multiple vdevs or separate logs

Ahh, I should have done some light reading, from ZFS Troubleshooting Guide:

You cannot use a RAID-Z configuration for a root pool. Only single-disk pools or pools with mirrored disks are supported.

I was thinking of reinstalling, but no, I'll go with two different pools. By the way, I understand the need for redundancy, but I'd prefer more spindles here.

# zpool create tank c1t1d0s0
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c1t1d0s0 overlaps with /dev/dsk/c1t1d0s2
# zpool create -f tank c1t1d0s0
# zpool iostat -v
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rpool       7.18G  60.8G      5      4   246K   255K
  c1t0d0s0  7.18G  60.8G      5      4   246K   255K
----------  -----  -----  -----  -----  -----  -----
tank        73.5K  68.0G      0      9  18.3K   165K
  c1t1d0s0  73.5K  68.0G      0      9  18.3K   165K
----------  -----  -----  -----  -----  -----  -----

Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
Time to update my w2100z

When I last configured my w2100z, it wasn't possible to have a ZFS root. And I did some funky stuff playing around with it. My current configuration (I have 2 drives, which I think should be 72G):

       0. c1t0d0 
          /pci@5,0/pci1022,7450@4/pci108e,534d@4,1/sd@0,0
Part      Tag    Flag     Cylinders        Size            Blocks
  0       root    wm     524 - 3134       20.00GB    (2611/0/0)  41945715
  1       swap    wu       1 -  523        4.01GB    (523/0/0)    8401995
  2     backup    wm       0 - 8913       68.28GB    (8914/0/0) 143203410
  3 unassigned    wm    3135 - 5745       20.00GB    (2611/0/0)  41945715
  4 unassigned    wm    5746 - 8356       20.00GB    (2611/0/0)  41945715
  5 unassigned    wm       0               0         (0/0/0)            0
  6 unassigned    wm       0               0         (0/0/0)            0
  7       home    wm    8357 - 8913        4.27GB    (557/0/0)    8948205
  8       boot    wu       0 -    0        7.84MB    (1/0/0)        16065
  9 unassigned    wm       0               0         (0/0/0)            0
       1. c1t1d0 
          /pci@5,0/pci1022,7450@4/pci108e,534d@4,1/sd@1,0
Part      Tag    Flag     Cylinders        Size            Blocks
  0      stand    wm       1 - 4466       34.21GB    (4466/0/0)  71746290
  1      stand    wm    4467 - 8932       34.21GB    (4466/0/0)  71746290
  2     backup    wu       0 - 8932       68.43GB    (8933/0/0) 143508645
  3 unassigned    wm       0               0         (0/0/0)            0
  4 unassigned    wm       0               0         (0/0/0)            0
  5 unassigned    wm       0               0         (0/0/0)            0
  6 unassigned    wm       0               0         (0/0/0)            0
  7 unassigned    wm       0               0         (0/0/0)            0
  8       boot    wu       0 -    0        7.84MB    (1/0/0)        16065
  9 unassigned    wm       0               0         (0/0/0)            0

I've shamelessly munged together output from different format commands. Anyway, the first drive has several available partitions for Live Update and grabbing in case of need. The second drive has two partitions used for ZFS.

This configuration is very flexible for doing updates. I can have several boot partitions on the root drive and I never have to worry about the data on my ZFS pool:

[root@warlock snv99]> zpool list zoo
NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
zoo     68G  51.5G  16.5G    75%  ONLINE  -
[root@warlock snv99]> zpool iostat -v
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zoo         51.5G  16.5G      0      1  21.9K  60.8K
  c1t1d0s0  33.6G   381M      0      0  7.31K  12.0K
  c1t1d0s1  17.9G  16.1G      0      1  14.6K  48.8K
----------  -----  -----  -----  -----  -----  -----

But I think I want to live more on the edge. I'm looking to get a more modern build on warlock:

[root@warlock snv99]> uname -a
SunOS warlock 5.11 snv_85 i86pc i386 i86pc

So, I'm going to back everything up onto an attached USB drive, and nuke the entire system.

Back in a bit

Since warlock is headless, the first task is to build an install DVD which has a modified menu.lst for grub - see Getting a Solaris bootable DVD for headless x86es.

While I'm doing that, I'm going to back up my system. I need the contents of /etc, my punchin configuration (a Sun VPN tool), my Sun Ray server configuration, and my homedirs. The rest I could probably care less about or already have saved off.

Also, I'm pretty ruthless, once I decide I don't need something, I will delete it. That gives me a better idea of how how much I still have to backup. And no, I'm not talking system stuff. Take for example here where I delete some ISO images:

[th199096@warlock isos]> df -h .
Filesystem             size   used  avail capacity  Mounted on
zoo/isos                67G    29G    16G    65%    /zoo/isos
[th199096@warlock x86]>	rm -rf snv7* snv8* snv90/ snv97
[th199096@warlock x86]>	df -h .
Filesystem             size   used  avail capacity  Mounted on
zoo/isos                67G    12G    33G    27%    /zoo/isos

You may not be comfortable with this approach, but once you reinstall it is gone anyway.

Cleaned out, the ISO is booting in a VirtualBox on my WinXP desktop, so I'm signing off here....


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

20080928 Sunday September 28, 2008
I think I've been making the new Networking Configuration too difficult

You'll notice in Bag of links about VirtualBox configurations that I have collected a lot of articles about how to configure VirtualBox networking. And if you look in the VirtualBox 2.0.2 User Manual you will see that the Linux section has about 9 pages and the Solaris one (6.9 on page 87) has 1 paragraph.

Perhaps Sun has greatly simplified the code with respect to running on OpenSolaris?

Okay, armed with the pithy User Manual. I'm going to try to configure Host Interface Networking on VirtualBoxes with a OpenSolaris host. First I need to find where to configure the networking:

(Click to zoom in)

Pretty easy, it is in the Details tab. Okay, I select Network and now it has to be difficult, right?

No, all I have to do is select the Attached to: and change NAT to Host Interface:

And look, I can select the Generate button to the right of the MAC Adress: field to generate a new one.

While I'd like to automate all of this (and a scan of the VirtualBox 2.0.2 User Manual suggests that i could easily do so), I'm going to bank on ZFS cloning to avoid most of this. All I will need to be able to do is automatically change the id of the storage:

VBoxManage internalcommands setvdiuuid vdifilename

and mac up a new MAC:

VBoxManage modifyvm -macaddress1 

I believe the first is undocumented and the second is ripped right out of the VirtualBox 2.0.2 User Manual.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

20080927 Saturday September 27, 2008
Bag of links about Virtual Box configurations

Just some links I've stumbled across on my odyssey with VirtualBox:

Solaris Cluster on a laptop using VirtualBox, iSCSI and a quorum server
VirtualBox 1.6.2 configuration with jumpstarting done via JET
Building a Solaris Cluster Express cluster in a VirtualBox on OpenSolaris
VirtualBox 1.6.2 configuration to build a cluster and using iSCSI from ZFS
Host Interface Networking in Sun xVM VirtualBox
VirtualBox 2.0.0 configuration of Host Interface Networking. Actually installing OpenSolaris on Ubuntu
VirtualBox meets JET...
Must be Virtual Box 1.6.* -- using JET and flar to manage quick setup of VMs. Hmm, he installs a OpenSolaris vbox on a Windows machine, puts JET on it, and then uses that machine to jumpstart others. Sweet article!
Configuring host networking for VirtualBox
Again a VirtualBox 1.6.* release, another 4150 with four dual-core CPUs and 8Gb RAM. Use /usr/lib/vna, which appears gone from Virtual Box 2.0.2.
Famous Quote:
Here's my script. No, I didn't use SMF. I'm old school. Bite me.

I'll add more as I collect them:

VirtualBox Buzz
How can any collection of links on VirtualBox be complete without this?

Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
Working towards a vbox image for distribution

One of the difficulties with pushing out a release for OpenSolaris Project: NFS version 4.1 pNFS is that we could only release source and BFU. We could not release a live image.

To complicate matters, part of the NFS code is in the closed repository. The impact of which was we had to also release a special closed-bins.

The difficulty lay in two areas:

  1. We weren't allowed to take the DVD image, install our bits, and send that back out. Note, if you search for kanigix on my blog, you'll see I provide recipes for making your own customized DVD, but I don't distribute DVDs.
  2. People, even ex-Sun employees, didn't want to install a stock system and BFU the updates.

We started to get requests for VMWare images. And we still weren't allowed to hand those out.

But OpenSolaris is adaptive to pressures in the community. I just asked again and was pointed towards the Hadoop project and especially this one: OpenSolaris Project: Hadoop Live CD.

My understanding is that we aren't trying to make a distribution, we aren't trying to steal thunder, instead we are trying to get systems out there to enable interoperability testing.

So now I'm working on a framework to get OpenSolaris + pNFS on a VirtualBox image.

Stay tuned as I go down the wrong path several times, but emerge with a working process.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
Builds are too slow...

Okay, I've got a brand new Sun Fire X4150 Server and it is geeked out with processors and memory. When I installed SUNWonbld, it said that I should use 36 for dmake concurrency. So, let's set our .make.files and let a build rip.

I'm going to modify usr/src/tools/env/developer.sh with the following:

[th199096@jhereg spe-build]> diff nightly.env  $SRC/tools/env/developer.sh
41c41
< NIGHTLY_OPTIONS="-aFCDlmprn";  export NIGHTLY_OPTIONS
---
> NIGHTLY_OPTIONS="-aCDlmpr";           export NIGHTLY_OPTIONS
194d193
< export CW_NO_SHADOW=1

I cut out the $STAFFER and such. The main differences are that I am not doing the gcc shadow building and I am not doing a non-DEBUG build. This should blaze, but it doesn't:

==== Nightly distributed build started:   Fri Sep 26 21:09:37 CDT 2008 ====
==== Nightly distributed build completed: Fri Sep 26 22:17:58 CDT 2008 ====

==== Total build time ====

real    1:08:21
...
/opt/SUNWspro/bin/dmake
dmake: Sun Distributed Make 7.7 2005/10/13
number of concurrent jobs = 4

Okay, I just took the hit to Studio 12, so maybe there is a bit more time for that. And I think I have everything local, but perhaps I am hitting the network. But lets focus on dmake telling me it will be using 4 concurrent jobs. That is by no stretch 36.

[th199096@jhereg spe-build]> grep jhereg ~/.make.machines 
jhereg   max=36
jhereg.central.sun.com   max=36

I invoke the build like this:

[th199096@jhereg spe-build]> printenv | grep DMAKE
DMAKE_MODE=parallel
DMAKE_MAX_JOBS=36
[th199096@jhereg spe-build]> env -i `which nightly` nightly.env
Time spent in user mode   (CPU seconds) : 10608.23s
Time spent in kernel mode (CPU seconds) : 6272.44s
Total time                              : 1:08:21.75s
CPU utilisation (percentage)            : 411.5%

I use env -i because someone told me that it makes sure I have just the right things in my environment. How can I tell that I'm getting the right number?

I can copy `which nightly` and hack it to just report the dmake concurrency.

[th199096@jhereg spe-build]> env -i ./nightly.tst -i nightly.env 
Testing DMAKE, quick exit
number of concurrent jobs = 4

Okay, pretty clear that I am only getting 4, but why? Add some more debugging in the main DMAKE procesisng code:

hostname=`uname -n`
if [ ! -f $HOME/.make.machines ]; then
        echo "No $HOME/.make.machines found!"
        DMAKE_MAX_JOBS=4
else
        echo "Grepping for $HOST in $HOME/.make.machines"
        DMAKE_MAX_JOBS="`grep $hostname $HOME/.make.machines | \
            tail -1 | awk -F= '{print $ 2;}'`"
        if [ "$DMAKE_MAX_JOBS" = "" ]; then
                echo "Nothing in that file!"
                DMAKE_MAX_JOBS=4
        fi
fi
DMAKE_MODE=parallel;
export DMAKE_MODE
export DMAKE_MAX_JOBS

And run it:

[th199096@jhereg spe-build]> env -i ./nightly.tst -i nightly.env
Grepping for jhereg in /.make.machines
Nothing in that file!
Testing DMAKE, quick exit
number of concurrent jobs = 4

Hey, why is it looking in /.make.machines and not in my homedir?1

[th199096@jhereg spe-build]> echo $HOME
/home/th199096
[th199096@jhereg spe-build]> more home.tst 
#!/bin/ksh -p
#

echo "My home is $HOME"
[th199096@jhereg spe-build]> env -i ./home.tst 
My home is 
[th199096@jhereg spe-build]> ./nightly.tst -i nightly.env 
Grepping for jhereg in /home/th199096/.make.machines
Testing DMAKE, quick exit
number of concurrent jobs = 36

Okay, env is hosing me.

[th199096@jhereg spe-build]> env -i HOME=/home/th199096 ./home.tst 
My home is /home/th199096

And crap, env spells it out for me:

OPTIONS
     The following options are supported:

     -i | -        Ignores the environment that  would  otherwise
                   be  inherited  from  the  current shell.  Res-
                   tricts the environment  for  utility  to  that
                   specified by the arguments.

So, another quick test:

[th199096@jhereg spe-build]> env ./home.tst
My home is /home/th199096

I know I was told to invoke my builds this way to speed them up - i.e., to grab the correct paths. I also know I've been battling this $HOME issue the whole time.

I wonder how long the build will take now?

[th199096@jhereg th199096]> zfs clone pool/builds/th199096/spe-gate@fresh pool/builds/th199096/spe-build2
[th199096@jhereg th199096]> ws spe-build2

Workspace                    : /builds/th199096/spe-build2
Workspace Parent             : ssh://aus1500-home//pool/ws/th199096/spe-gate
Proto area ($ROOT)           : /builds/th199096/spe-build2/proto/root_i386
Root of source ($SRC)        : /builds/th199096/spe-build2/usr/src
Root of test source ($TSRC)  : /builds/th199096/spe-build2/usr/ontest
Current directory ($PWD)     : /builds/th199096/spe-build2

[th199096@jhereg spe-build2]> cp ../spe-build/nightly.env  .
[th199096@jhereg spe-build2]> vi nightly.env 
[th199096@jhereg spe-build2]> rm ../spe-build/nightly.tst 
[th199096@jhereg spe-build2]> `which nightly` nightly.env 

Yeah, zfs clone is sweet for rapid testing of a baseline!

And we get such a big savings, not!

[th199096@jhereg spe-build2]> `which nightly` nightly.env 
Time spent in user mode   (CPU seconds) : 10624.32s
Time spent in kernel mode (CPU seconds) : 7579.56s
Total time                              : 1:04:35.29s
CPU utilisation (percentage)            : 469.7%

The concurrency was correct:

/opt/SUNWspro/bin/dmake
dmake: Sun Distributed Make 7.7 2005/10/13
number of concurrent jobs = 36

All of the important tools are local:

[th199096@jhereg spe-build2]> df -h /opt/SUNWspro/bin/dmake /opt/onbld/bin/nightly /opt/onbld/bin/i386/cw /usr/java/bin/javac /usr/ccs/bin/as 
Filesystem             size   used  avail capacity  Mounted on
pool/tools             134G   9.6G    38G    21%    /pool/tools
pool/tools             134G   9.6G    38G    21%    /pool/tools
pool/tools             134G   9.6G    38G    21%    /pool/tools
/dev/dsk/c0t0d0s0       44G    11G    32G    27%    /
/dev/dsk/c0t0d0s0       44G    11G    32G    27%    /

Ok, the next thing will be to check if there is a difference between working with a clone (which has to copy-on-write) and a fresh dataset.

[th199096@jhereg spe-build3]> `which nightly` nightly.env 
Time spent in user mode   (CPU seconds) : 10634.44s
Time spent in kernel mode (CPU seconds) : 9678.18s
Total time                              : 1:08:42.11s
CPU utilisation (percentage)            : 492.7%

No. I'll have to think on this. The other option available is to reimage the system with all 3 disks in the pool:

[root@jhereg ~]> zpool iostat -v
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
pool        96.9G  39.1G      1      6  61.4K   448K
  c0t1d0    48.5G  19.5G      0      3  30.8K   224K
  c0t2d0    48.5G  19.5G      0      3  30.5K   224K
----------  -----  -----  -----  -----  -----  -----

Not sure how much one more spindle will reduce the build.2.

Okay, last test is to remove the following options:


And these yield the biggest savings to date3:

[th199096@jhereg spe-build4]> `which nightly` nightly.env 
Time spent in user mode   (CPU seconds) : 7993.05s
Time spent in kernel mode (CPU seconds) : 4818.01s
Total time                              : 45:44.46s
CPU utilisation (percentage)            : 466.7%

The hole we as developers tend to fall into is to want to rebuild everything. We don't always need to rebuild the BFU archives if we are just changing a kernel module. At the BAT, I was rebuilding just the nfs or nfssrv modules and scp'ing them over (I might have hosed NFS don't ya know). My "build" times were in the matter of seconds. I spent more time moving the mouse and worrying about whether or not I had changed a header which needed to be installed in my proto area.

And in the end, before I can integrate my changes, I'll need to be lint and cstyle clean, I'll need to build non-DEBUG versions, and I'll need to build for sparc. And I'll need to retest then.

I started off with a moral about questioning advice given to you versus actual experience, but it turns out the increase in dmake concurrency didn't really help, now did it?

Notes

/.make.machine

Going back, I wondered why my test did not complain about not finding /.make.machine:

[root@jhereg scripts]> ls -la /.make.machines 
lrwxrwxrwx   1 root     other         27 Sep 26 12:32 /.make.machines -> opt/onbld/gk/.make.machines
[root@jhereg scripts]> more !$
more /.make.machines
elpaso max=20

So there is a default installed by SUNWonbld.

Broken disk?

Hey, wait, don't I really have four disks and not three?

[th199096@jhereg th199096]> iostat
   tty        sd0           sd1           sd2           sd3            cpu
 tin tout kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
   0  113   0   0    0   66   2   40  304   5   28  303   5   27    3  3  0 93
# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0t0d0 
          /pci@0,0/pci8086,25f8@4/pci1000,3150@0/sd@0,0
       1. c0t1d0 
          /pci@0,0/pci8086,25f8@4/pci1000,3150@0/sd@1,0
       2. c0t2d0 
          /pci@0,0/pci8086,25f8@4/pci1000,3150@0/sd@2,0

I saw some message before the last jumpstart about taking some disk offline. And I've never really seen jhereg. It is in a lab in Austin.

Okaay, that missing disk is the DVD drive: :->

[root@jhereg ~]> iostat -En
c1t0d0           Soft Errors: 0 Hard Errors: 11 Transport Errors: 6 
Vendor: TSSTcorp Product: CD/DVDW TS-T632A Revision: SR03 Serial No:  
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 11 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c0t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: HITACHI  Product: H101473SCSUN72G  Revision: SA25 Serial No: 0827DAELAA 
Size: 73.41GB <73407865856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c0t1d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: HITACHI  Product: H101473SCSUN72G  Revision: SA25 Serial No: 0827DAG92A 
Size: 73.41GB <73407865856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c0t2d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: HITACHI  Product: H101473SCSUN72G  Revision: SA25 Serial No: 0827DA6AWA 
Size: 73.41GB <73407865856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 

Groan, I messed up the 4th build

I got my mail message for that fast lint build and it stated that build3 had finished. I got the wrong directory! I had copied over the nightly.env, fixed the path, and then made an error. So I copied the file over again. Except this time I forgot to change the path!

So the savings may have been false. Another build has been kicked off!

[th199096@jhereg spe-build4]> `which nightly` nightly.env
Time spent in user mode   (CPU seconds) : 7965.57s
Time spent in kernel mode (CPU seconds) : 4818.72s
Total time                              : 46:52.02s
CPU utilisation (percentage)            : 454.6%

So the savings were real.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily