I just announced on nfs41-discuss that the closed-binaries went live! See Mercurial Repository created.
When I configured my community, I followed all of the steps outlined at Setting up a pnfs community except for the mdsadm command on the MDS server:
[root@pnfs-9-10 ~]> mdsadm -o add -t auth -a ip=10.1.233.50 adding: IP Addr - 10.1.233.50
I experienced the following on my MDS console, which are being investigated but do not appear to be fatal:
[root@pnfs-9-14 ~]> Oct 3 17:00:11 pnfs-9-14 /usr/lib/nfs/nfsd[101025]: write failed for /var/nfs/v4_state/mds_/010.001.233.053-6448e6939a: write(669938620) returned -1 errno=14 ss_len=669938600 Oct 3 17:05:55 pnfs-9-14 nfssrv: NOTICE: op_destroy_session: SP4_NONE
And ditto for these on the DS console:
[root@pnfs-9-13 ~]> dservadm enable [root@pnfs-9-13 ~]> Oct 3 16:52:29 pnfs-9-13 dserv[101033]: bad cmd: 3 Oct 3 16:52:29 pnfs-9-13 last message repeated 1 time Oct 3 16:52:32 pnfs-9-13 dserv: WARNING: CLNT_CALL() ds protocol to mds failed: 5 Oct 3 16:52:32 pnfs-9-13 dserv[101033]: ioctl failed: I/O error sahre sahre: Command not found. [root@pnfs-9-13 ~]> share -@data/nfs4 /data/nfs4 anon=0,sec=sys,rw "" [root@pnfs-9-13 ~]> Oct 3 17:00:27 pnfs-9-13 /usr/lib/nfs/nfsd[101019]: write failed for /var/nfs/v4_state/mds_/010.001.233.053-6448e693c4: write(419178265) returned -1 errno=14 ss_len=419178245
So I have a successful community up and running. I'll push the closed binaries out later tonight. Life impinges...
We had the first developer make a real push to the nfs41-gate on the OpenSolaris NFSv41 Project Repository.
Jim Wahlig had this to push:
[thud@adept nfs41-gate]> hg incoming comparing with ssh://anon@hg.opensolaris.org/hg/nfsv41/nfs41-gate searching for changes changeset: 7743:c672b1cb86be user: Thomas Haynesdate: Thu Oct 02 22:28:30 2008 -0500 summary: Added tag closedv1 for changeset 9fab48a31a4a changeset: 7744:763bfa203d1a tag: tip user: jwahlig@aus-build3 date: Fri Oct 03 11:52:59 2008 -0500 summary: fix stable storage on x86.
The only issue we encountered was that mail did not get accepted for the nfs41-discuss mailing list. I'll have to look at that.
You can also see above the tag I pushed for closedv1.
Okay, building OpenSolaris with the opensolaris.sh environment inside SWAN is different. I first tried it with:
% ws cleanroom % nightly opensolaris.sh
And got garbage. I tried palying with some environment variables and didn't get anywhere. I then tried it with bldenv:
% exit % cd cleanroom % bldenv -d opensolaris.sh % nightly opensolaris.sh
That went fast:
/opt/SUNWspro/bin/dmake dmake: Sun Distributed Make 7.7 2005/10/13 number of concurrent jobs = 36 No 32-bit compiler found *** Error code 1 The following command caused the error: if /builds/th199096/cleanroom/usr/src/tools/proto/opt/onbld/bin/i386/cw -_cc -_versions >/dev/null 2>/dev/null; then \
Finally, I went back to the ws approach and with the following opensolaris.sh diffs:
[th199096@jhereg cleanroom]> diff opensolaris.sh usr/src/tools/env/opensolaris.sh 45c45 < GATE=cleanroom; export GATE --- > GATE=testws; export GATE 48c48 < CODEMGR_WS="/builds/th199096/$GATE"; export CODEMGR_WS --- > CODEMGR_WS="/export/$GATE"; export CODEMGR_WS 91c91 < STAFFER=th199096; export STAFFER --- > STAFFER=nobody; export STAFFER 157c157 < #BUILD_TOOLS=/opt; export BUILD_TOOLS --- > BUILD_TOOLS=/opt; export BUILD_TOOLS 159,161c159,160 < #SPRO_ROOT=/opt/SUNWspro; export SPRO_ROOT < #SPRO_VROOT=$SPRO_ROOT; export SPRO_VROOT < #__SSNEXT=""; export __SSNEXT --- > SPRO_ROOT=/opt/SUNWspro; export SPRO_ROOT > SPRO_VROOT=$SPRO_ROOT; export SPRO_VROOT 186d184 < export CW_NO_SHADOW=1
That seems to have worked. Now I need to test a pNFS community setup and run cthon.
So I have the closed binaries which correspond to the new nfs41-gate up on osol. I grabbed a copy of that source and started a build up. And it failed.
My thoughts were that either:
The first is justifiable paranoia and the second has happened to me before. So, I searched my blog (more than 51% of why I blog is to have an easy to search repository of tips, tricks, and efdups.) and found this tidbit: RTFR - Or make sure you do read all of the README. Now it wasn't a direct hit, but what the hey, while I'm here I should read that README.
And sure enought, it has something on the compiler switch:
Please note that the compiler that comes with the Solaris Developer Express release is Studio 12, which is not the standard compiler for OpenSolaris code. If you use Studio 12, you will need to set __SSNEXT to the null string in your environment file. Please do report problems with Studio 12, particularly if the problem goes away when you use Studio 11 (the current standard compiler).
I'll rebuild with that change and see if it is a hit or the paranoia is justifiable after all.
I wrote about how I didn't know how to mix Mercurial and ZFS data sets together to get a new clone on a new dataset. Dave Marker provided this insight:
zfs create pool/ws/th199096/spe-build cd /pool/ws/th199096/spe-build hg init echo "[paths]" > .hg/hgrc echo "default = ssh://anon@hg.opensolaris.org/hg/nfsv41/nfs41-gate" >> .hg/hgrc hg pull -u
The trick is realizing that there is nothing magical about 'hg clone'.
And if at this point I want to do a closed gate, I can use my normal incantation because it wil be on the same dataset.
And I can then create a ZFS snapshot and clone that to my heart's desire.
When we transitioned from TeamWare (tw) to Mercurial (hg), I made several attempts to craft a group workspace. The killer always seemed to be that I had to manually do an 'hg update' in the gate and that was too difficult to remember. In the end, I decided to mimic what the ON gatekeepers were doing. At first, I did it all by brute force, copying everything that they had in place. Eventually, since I didn't know Python, I started asking Dave Marker for help. And boy, did I get some. Anyway, here is what I went through to set up nfs41-gate and nfs41-clone.
You want to create a restricted user account for a couple of reasons. At first I thought this was to just keep people from sneaking a look at the hgrc files and such, but there is a broader need in that you want to force all writes to the gate to come through a single account. That way you can configure the push process to always go through some sanity checks. You don't want this to be your regular account, because it will restrict your ability to do things. I'll show that to you in a bit.
[nfs4hg@aus1500-home hook]> grep nfs4hg /etc/passwd nfs4hg:x:3530:1813:Mr. NFS4 HG:/pool/nfs4hg:/usr/bin/tcsh [nfs4hg@aus1500-home hook]> grep 1813 /etc/group mhg::1813:th199096
You want a uid and gid which is not in the NIS maps and you want the account to be local to your gate machine.
You want to leverage ZFS because snapshots and clones are your safety nets. A snapshot saves you from a bad command and a clone lets you try new things in a sandbox.
zfs create pool/ws/nfs41-gate zfs create pool/ws/nfs41-clone chown nfs4hg:mhg /pool/ws/nfs41-gate /pool/ws/nfs41-clone
Be sure to login as your restricted user:
warlock % ssh aus1500-home aus1500-home % su - nfs4hg
Get used to doing it this way - you won't be able to ssh directly as nfs4hg before too long.
I'm going to assume a new branch off of onnv-gate. If you have an existing tw or hg workspace, you can substitute in the relevant commands to create the gate. I never want to speak of migrating TeamWare to Mercurial.
I don't know why, but I have to do something like:
cd /pool/ws/nfs41-gate hg clone ssh://anon@hg.opensolaris.org/hg/onnv/onnv-gate mv onnv-gate/.hg* . mv onnv-gate/usr . rm -rf onnv-gate
I find Mercurial doesn't like it if the target directory already exists. If you are inside SWAN, make sure to also get the closed bits:
cd usr hg clone ssh://anon@onnv.eng/export/onnv-clone/usr/closed
And now make your clone off of your new gate. You want to make sure that the hg paths are correct:
[nfs4hg@aus1500-home nfs41-clone]> hg paths default = /pool/ws/nfs41-gate
You can get these via:
[thud@adept ~/foo]> hg clone ssh://anon@hg.opensolaris.org/hg/scm-migration/onnv-gk-tools destination directory: onnv-gk-tools requesting all changes adding changesets adding manifests adding file changes added 40 changesets with 171 changes to 50 files 35 files updated, 0 files merged, 0 files removed, 0 files unresolved
The onnv-gk-tools are the heart of setting up your project gate. For the nfs41-gate, I put them outside of both the gate and the clone:
[nfs4hg@aus1500-home ws]> zfs list | grep onnv-gk pool/onnv-gk-tools 630K 4.31T 630K /pool/onnv-gk-tools
Again, leverage ZFS for things like this tool set.
Read onnv-gk-tools/README at this point. I may have done things from there and forgotten to mention them here.
You'll want to copy the existing hgrc files over to your gate and clone:
cp gate-hgrc /pool/ws/nfs41-gate/.hg/hgrc cp gate-closed-hgrc /pool/ws/nfs41-gate/usr/closed/.hg/hgrc cp clone-hgrc /pool/ws/nfs41-clone/.hg/hgrc cp clone-closed-hgrc /pool/ws/nfs41-clone/usr/closed/.hg/hgrc
I can't say this enough, from the README:
gate enforcement:
Mercurial only lets you pull if you can read {REPO}/.hg
Mercurial only lets you push if you can write {REPO}/.hg
So {GATE} and {GATE}/usr/closed are owned by onhg/gk
Mode is set to 0770
{CLONE} and {CLONE}/usr/closed are also owned by onhg/gk
But mode is set to 0775
So, this should be configurable, but for now you want to comment out the following to avoid spamming a mailing list:
[nfs4hg@aus1500-home onnv-gk-tools]> diff hook/notify.py ~/onnv-gk-tools/hook/notify.py 80c80 < # m.msg["Bcc"] = "onnv-flagdays@onnv.eng" --- > m.msg["Bcc"] = "onnv-flagdays@onnv.eng"
Note that you want to make sure the '#' is added right where the 'm' was - I hear Python really cares about indentation.
You'll want to copy this file to the restricted account's homedir and rename it as well. You don't want to inadvertently refer to the ON one at any point:
cp on-hg.py ~/nfs4-hg.py
This is the step that configures Mercurial to understand your gate.
I'm not going to step through the changes, I feel they are explanatory. Note though that later we will see that some of the Python scripts do not make use of parts of this file. I.e., GATE_USER could be used in the hook/updateoso.py file.
Hmm, and so far, all of the user accounts and paths needed for these changes exist.
[nfs4hg@aus1500-home onnv-gk-tools]> diff etc/config.py ~/onnv-gk-tools/etc/config.py 87c87 < GATE_NAME = "nfs41" --- > GATE_NAME = "onnv" 89c89 < GATE_WS = "/pool/ws/%s-gate" % (GATE_NAME) --- > GATE_WS = "/ws/%s-gate" % (GATE_NAME) 91c91 < CLONE_WS = "/pool/ws/%s-clone" % (GATE_NAME) --- > CLONE_WS = "/ws/%s-clone" % (GATE_NAME) 94c94 < GATE_DIR = "/pool/ws/nfs41-gate" --- > GATE_DIR = "/export/onnv-gate" 96c96 < CLONE_DIR = "/pool/ws/nfs41-clone" --- > CLONE_DIR = "/export/onnv-clone" 99,104c99,104 < GATE_HOST = "aus1500-home" < GATE_ALTHOST = "aus1500-home" < GATE_HOST_X = "aus1500-home" < GATE_HOST_S = "aus1500-home" < GATE_DOMAIN = "central" < GATE_MAIL = "aus1500-home.central" --- > GATE_HOST = "elpaso" > GATE_ALTHOST = "juarez" > GATE_HOST_X = "elpaso" > GATE_HOST_S = "juarez" > GATE_DOMAIN = "sfbay" > GATE_MAIL = "onnv.eng" 106,110c106,110 < GATEKEEPER = "th199096" < ASSTGATEKEEPER = "rmesta" < TECHLEAD = "th199096" < ASSTTECHLEAD = "rmesta" < CTEAMLEAD = "webaker" --- > GATEKEEPER = "dm120769" > ASSTGATEKEEPER = "suha" > TECHLEAD = "jbeck" > ASSTTECHLEAD = "nickto" > CTEAMLEAD = "muolla" 112,113c112,113 < ALIAS_GK = "th199096@%s" % (GATE_MAIL) < ALIAS_GATEKEEPER = "th199096@%s" % (GATE_MAIL) --- > ALIAS_GK = "gk@%s" % (GATE_MAIL) > ALIAS_GATEKEEPER = "gatekeeper@%s" % (GATE_MAIL) 115,116c115,116 < GATE_USER = "nfs4hg" < GATE_GROUP = "mhg" --- > GATE_USER = "onhg" > GATE_GROUP = "gk" 118,119c118,119 < SNAPS_DIR = "/pool/ws/snapshot" < BUILDS_DIR = "/pool/ws/builds" --- > SNAPS_DIR = "/export/snapshot" > BUILDS_DIR = "/export/builds"
Now we go back and edit the hgrc files for the various pieces. These modifications tell the gate how to interact with the clone, etc.
I will annotate these changes:
[nfs4hg@aus1500-home .hg]> diff hgrc ~/onnv-gk-tools/gate-hgrc 17c17 < hook = /pool/onnv-gk-tools/hook --- > hook = /export/onnv-gate/public/python/hook
Okay, we need to tell the gate where our config.py file is and how to use the extensions. The above does that. Note that if we do not make this change, we could impact ON.
20c20 < gatename = nfs41-gate --- > gatename = onnv-gate 23c23 < wlock = nfs4hg, th199096 --- > wlock = onhg, dm120769, suha 27,28c27,28 < recv = pnfs-core@sun.com < #logmail = onnv-gate-putback-log@onnv.eng --- > recv = onnv-gate-notify@onnv.eng > logmail = onnv-gate-putback-log@onnv.eng 32,33c32,33 < recv = thomas.haynes@sun.com < rti = False --- > recv = onnv-putback-diffs@onnv.eng > rti = True
With a development gate, you bypass the RTI process. So, we should bypass the checking for it.
36,38d35 < [web] < baseurl = http://aus1500-home.central < 40c37 < url = http://aus1500-home.central/pool/ws/nfs41-gate --- > url = http://onnv.sfbay/net/onnv.sfbay 43c40 < temp = /pool/nfs4hg/webrev --- > temp = /space/webrev
Ah, we will want to create this directory. I understand that you want this directory to have parents that do not have ".hg/" or "Codemgr_wsdata/". If there is even one with a subdirectory of with these names, it will mess things up.
45,48c42,49 < #[rti] < #webrticli = /net/webrti/export/home/bin/webrticli < #url = http://webrti.sfbay/rti/xml/index.php < #project = on --- > # advocate is only set for restricted builds. > # When set only those listed (separated by commas) are valid RTI advocates. > # Any others used will cause a rollback from rti.py > [rti] > webrticli = /ws/onnv-gate/public/bin/webrticli > url = http://webrti.sfbay/rti/xml/index.php > project = on > #advocate = John (dot) Beck (at) sun (DOT) com
We really, really want to bypass RTI checking and John really doesn't want to be spammed by this. (And this is the only place I changed the source.)
51c52 < comchk = False --- > comchk = True
We know a comments in a development gate are not going to be valid, so do no checks.
74c75 < #pretxnchangegroup.2 = python:hook.rti.rti --- > pretxnchangegroup.2 = python:hook.rti.rti
Again, no RTI at all!
82c83 < changegroup.0 = /usr/bin/hg push -R /pool/ws/nfs41-gate /pool/ws/nfs41-clone --- > changegroup.0 = /usr/bin/hg push -R /export/onnv-gate /export/onnv-clone 91d91 <
Ahh, we should get the above out of etc/config.py, no?
BTW: The two lines I care about most here are:
changegroup.0 = /usr/bin/hg push -R /pool/ws/nfs41-gate /pool/ws/nfs41-clone changegroup.1 = /usr/bin/hg update
Basically, after a push occurs, first push that change to the clone and then run update. See, I don't want to be doing that manually!
Also, note that because of the following lines:
[gatehooks] gatename = nfs41-gate logdir = public/log lockdir = public/lock
You will want to create:
cd /pool/ws/nfs41-gate mkdir -p public/log mkdir public/lock
All of the above changes apply, but the only real diff is the following:
81,82c81 < # push to hg.os.o will be done out of cron. < changegroup.0 = /usr/bin/hg push -R /pool/ws/nfs41-gate/usr/closed /pool/ws/nfs41-clone/usr/closed --- > changegroup.0 = /usr/bin/hg push -R /export/onnv-gate/usr/closed /export/onnv-clone/usr/closed 91d89
And again, the changes will automatically occur to the clone.
This one is much simpler, mainly because there is not much there:
[nfs4hg@aus1500-home .hg]> cd /pool/ws/nfs41-clone/.hg [nfs4hg@aus1500-home .hg]> diff hgrc ~/onnv-gk-tools/clone-hgrc 12c12 < default = /pool/ws/nfs41-gate --- > default = /export/onnv-gate
We tell the clone where the parent is located.
15c15 < hook = /pool/onnv-gk-tools/hook --- > hook = /export/onnv-gate/public/python/hook 19c19 < gate = file:/pool/ws/nfs41-gate --- > gate = file:/export/onnv-gate 36d35
Where are the hooks and the gate? All of this should be in etc/config.py.
BTW an important line here is:
# These hooks are run from bghook() in the background bg-changegroup.0 = python:hook.updateoso.updateoso
When the clone gets updated, then we will push a change out to OpenSolaris!
If you don't have a repository out there, shame on you! Well, just comment out this line.
Exact same diffs as above.
A big difference between the gate and the clone is in the hgrcs. The gate is write only and the clone is read only.
So the clone has to prevent writes before they occur. This line does that:
# This prevents boneheaded gatekeepers and gives a more useful message # to gatelings who trust our hooks. prechangegroup.0 = python:hook.cloneincoming.cloneincoming
And I haven't figured out how the gate keeps people from reading. Note, yes I have, see onnv-gk-tools/README. So part of the above may be wrong....
These appear pretty self-explanatory:
[nfs4hg@aus1500-home ~]> diff nfs4-hg.py onnv-gk-tools/on-hg.py 45c45 < HGLOGIN = "nfs4hg" --- > HGLOGIN = "onhg" 54,55c54,55 < "/pool/ws/nfs41-gate", < "/pool/ws/nfs41-gate/usr/closed", --- > "/export/onnv-gate", > "/export/onnv-gate/usr/closed",
Okay, we almost have everything done that I remember. At this point, you need to start sending emails to your developers for them to send you in their SSH public keys -- see opensolaris.org SSH key help. They need to do this for ON anyway.
Once you get them, then you will add them to the ~/.ssh/authorized_keys of your restricted account. The format of each entry will be:
command="~/nfs4-hg.py 'th199096' ",no-port-forwarding,no-X11-forwarding,no-agent-forwarding [the contents of their id_ds.pub file]
You will have one per user.
The format needed here is discussed in onnv-gk-tools/README.
I did all of this a month or so ago. I am reconstructing what I did. I may have missed some steps.
All of the steps reported here are mine. All mention of possible bugs is my opinion.
The only cron job I have running is:
[nfs4hg@aus1500-home ~/onnv-gk-tools]> crontab -l 7 3 * * * /pool/ws/scripts/buildtags.sh /pool/ws/nfs41-clone/developer.sh
This will rebuild the cscope and tags databases in the clone. I could do this in another clone, but I like it occurring in a well known place. I do not want it in the gate.
I haven't provided the details on how to configure an automatic push to OpenSolaris...
A good link for jumping off: How to Use Mercurial (hg) Repositories