Thursday July 03, 2008 Today's dtrace “one liner” is to dig into why a process that is failing when a call to gethostbyname_r(). Now if the application reported all that was going on this would not be needed, meanwhile in the real world we are lucky enough to have dtrace. To test it out I'm tracing the getent(1) command. In real life it would use the -p flag to dtrace and the process ID of the daemon.
/usr/sbin/dtrace -n 'pid$target::gethostbyname_r:entry {
self->name = arg0;
}
pid$target::gethostbyname_r:return / arg1 == 0 / {
ustack(5);
}
pid$target::gethostbyname_r:return / self->name / {
printf("%s", copyinstr(self->name));
self->name = 0
}' -c "getent hosts xxxxxdredd"
dtrace: description 'pid$target::gethostbyname_r:entry ' matched 3 probes
dtrace: pid 4748 has exited
CPU ID FUNCTION:NAME
2 48759 gethostbyname_r:return
libnsl.so.1`gethostbyname_r+0xc4
getent`dogethost+0x54
getent`main+0x7c
getent`_start+0x108
2 48759 gethostbyname_r:return xxxxxdredd
It would be nice to be able to get the h_errno value as well but so far I've not managed that.
Update:
Jon Haslem kindly explained to me the subtleties of copyin() so that I can get the h_errno value.
/usr/sbin/dtrace -Zn 'pid$target::gethostbyname_r:entry {
self->name = arg0;
self->errno = arg4;
}
pid$target::gethostbyname_r:return / arg1 == 0 / {
ustack(5);
printf("%d %s h_errno %x", pid,
copyinstr(self->name),
*(int *)copyin(self->errno,sizeof(int)));
}' -c "getent hosts xxxxxdredd"
dtrace: description 'pid$target::gethostbyname_r:entry ' matched 2 probes
dtrace: pid 5087 has exited
CPU ID FUNCTION:NAME
2 48764 gethostbyname_r:entry errno: d4220008
2 48765 gethostbyname_r:return
libnsl.so.1`gethostbyname_r+0xc4
getent`dogethost+0x54
getent`main+0x7c
getent`_start+0x108
5087 xxxxxdredd h_errno 1
#
Wednesday July 02, 2008 This is not quite a one liner as I'm reusing the code from a previous post to print out the devices in a human readable form other wise it is just a one liner and was when I typed it in.
The question posed here was what is the maximum number of commands sent to a LUN at any one time? Clearly this will max out at the throttle for the device however what was interesting, since the customer had already tuned the throttle down and the problem had gone away was what was their configuration capable of sending to the LUN:
#!/usr/sbin/dtrace -qCs
#define SD_TO_DEVINFO(un) ((struct dev_info *)((un)->un_sd->sd_dev))
#define DEV_NAME(un) \
stringof(`devnamesp[SD_TO_DEVINFO(un)->devi_major].dn_name) /* ` */
#define DEV_INST(un) (SD_TO_DEVINFO(un)->devi_instance)
fbt:*sd:*sd_start_cmds:entry { @[DEV_NAME(args[0]),DEV_INST(args[0])] = max(arg
s[0]->un_ncmds_in_driver) }
END {
printa("%s%d %@d\n", @);
}
This produces a nice list of disk devices and the maximum number of commands that have been sent to them at anyone time:
# dtrace -qCs /var/tmp/max_sd.d -n 'tick-5sec { exit(0) }'
sd2 1
sd0 70
#
Combine that with the dscript from the latency bubble posting earlier and you can drill down on where your IO is waiting.
Tuesday July 01, 2008 Todays dtrace one liner is part of a case investigating why messages are not making it into the messages file. Using the divide and concur priciple the first question you need to answer is: “Is the process that is supposed to generate the messages calling into syslog?”
$ dtrace -n 'pid$target::syslog:entry { printf("%d %s", arg0, copyinstr(arg1)) }' -p $(pgrep xxxx)
dtrace: description 'pid$target::syslog:entry ' matched 1 probe
CPU ID FUNCTION:NAME
1 43227 syslog:entry 5 %s
^C
That is enough to answer the first question. You can get all flash and pull out the string passed in as the second argument but then it is not a one liner and is answering a different question. However it is neat so here it is:
$ dtrace -qn 'pid$target::syslog:entry { printf("%d %s", arg0, (this->arg1 = copy
instr(arg1))) }
pid$target::syslog:entry / this->arg1 == "%s"/ { printf(" %s\n", copyinstr(arg2))
}' -p $(pgrep xxxx)
How about varargs and vsprintf for dtrace....
Monday June 30, 2008 Eight riders went out becoming nine at Weybridge until just before the main climb to Newlands Corner where we became Eight again. After Newlands we were taken on a “loop”, down into Albury and then out via Guildford Road which takes you over Albury Heath, a wonderful part of the world with views of “typical” English Countryside. It does however involve a steep climb to the top and then a treacherous descent back to the main road again.
The GPS mapped it all out and the ride is here: http://www.mapmyride.com/ride/united-kingdom/walton-on-thames/439119349651 however map my ride continues to misbehave with firefox on Solaris so I have not embedded it.
We had breakfast “Al fresco” in Peaslake before riding back over Leath Hill. The group split at West Humble with the majority taking the flat direct route back and two of us going back via the Polsden Lacey road and the 25% climb that that entails.
Despite this being a hilly ride the stats show just how flat this part of the world is: 429m of climbing over 65 miles. Hardly Ventoux!
Wednesday June 25, 2008 I'm always copying data from home to work and less often from work to home. Mostly these are disk images. I always check the md5 sum just out of paranoia. It turns out you can't be paranoid enough! The thing to remember if the check sums don't match is not to copy the file again but use rsync. It will bring over just the blocks that are corrupt.
: enoexec.eu FSS 43 $; scp thegerhards.com:/tank/tmp/diskimage.fat.bz2 . diskimage.fat.bz2 100% |*****************************| 1825 MB 11:10:31 : enoexec.eu FSS 44 $; digest -a md5 diskimage.fat.bz2 674f69eec065da2b4d3da4bf45c7ae5f : enoexec.eu FSS 45 $; ssh thegerhards.com digest -a md5 /tank/tmp/diskimage.fat.bz2 191f26762d5b48e0010a575b54746e80 : enoexec.eu FSS 46 $; ls -l diskimage.fat.bz2 -rw-r----- 1 cg13442 staff 1913779931 Jun 25 08:56 diskimage.fat.bz2 : enoexec.eu FSS 47 $; rsync thegerhards.com:/tank/tmp/diskimage.fat.bz2 diskimage.fat.bz2 : enoexec.eu FSS 48 $; digest -a md5 diskimage.fat.bz2 191f26762d5b48e0010a575b54746e80 : enoexec.eu FSS 49 $;
Since my home directory is now on ZFS and I snapshot every time my card gets inserted into the Sun Ray I can now take a look at what went wrong. Using my zfs_versions script I can get a list of the different versions of the file from all the snapshots:
: enoexec.eu FSS 56 $; digest -a md5 $( zfs_versions diskimage.fat.bz2 | nawk '{ print $NF }')
(/home/cg13442/.zfs/snapshot/user_snap_2008-06-25-05:51:57/diskimage.fat.bz2) = 0a193e0e80dbf83beabca12de09702a0
(/home/cg13442/.zfs/snapshot/user_snap_2008-06-25-05:54:44/diskimage.fat.bz2) = 7aa78dba6a7556fe10115aa5fc345bad
(/home/cg13442/.zfs/snapshot/user_snap_2008-06-25-07:05:34/diskimage.fat.bz2) = c6a77429920f258dfca1dbbd5018a69c
(/home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:06:39/diskimage.fat.bz2) = 674f69eec065da2b4d3da4bf45c7ae5f
(/home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:38:22/diskimage.fat.bz2) = 191f26762d5b48e0010a575b54746e80
: enoexec.eu FSS 57 $;
So the last two files in the list represent the corrupted file and the good file:
: enoexec.eu FSS 57 $; cmp -l /home/cg13442/.zfs/snapshot/user_snap_2008-06-2> cmp -l /home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:06:39/diskimage.fat.bz2 /home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:38:22/diskimage.fat.bz2 | head -10 84262913 0 360 84262914 0 14 84262915 0 237 84262916 0 25 84262917 0 342 84262918 0 304 84262919 0 41 84262920 0 12 84262921 0 372 84262922 0 20 : enoexec.eu FSS 58 $;
and there appear to be blocks of zeros.
: enoexec.eu FSS 58 $; cmp -l /home/cg13442/.zfs/snapshot/user_snap_2008-06-2>
cmp -l /home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:06:39/diskimage.fat.bz2 /home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:38:22/diskimage.fat.bz2 | nawk '$2 != 0 { print $0 } $2 == 0 { count++ } END { printf("%x\n", count ) }'
23d8c
: enoexec.eu FSS 58 $;or at least 0x23d8c bytes were zero that should not have been. Need to see if I can reproduce this.
Anyway the moral is always check the md5 digest and if it is wrong use rsync to correct it.
Tuesday June 24, 2008 Our Sun Ray upgrade strategy hiccuped and now has both the SPARC systems running the same release, which when it is nv92 is nothing to complain about:
: enoexec.eu FSS 1 $; uname -a SunOS enoexec 5.11 snv_92 sun4v sparc SUNW,SPARC-Enterprise-T5220 : enoexec.eu FSS 2 $;
The reason for the hiccup was two fold. Once we had established that the T5220 was a perfect Sun Ray server and managed to find and file some really nasty bugs found because we were using it a far sighted director agreed to fund one long term. This left the old Sun Fire system looking like a very large bit of tin, burning lots of power and taking up a lot of space while only providing one Sun Ray server. So that has now been replaced with a V890:
: estale.eu FSS 1 $; uname -a SunOS estale 5.11 snv_92 sun4u sparc SUNW,Sun-Fire-V890 : estale.eu FSS 2 $;
Since this was fresh hardware it was freshly installed and was to be the build 92 server while the T5220 served build 91 and would at some point serve build 94. That was until we diagnosed that we were hitting a bug on the T5220 which made it stall sometime for minutes that is fixed in build 92 so we have both systems running build 92.
Saturday June 21, 2008 At last the rest of the bits of “gaim” that disappeared from Solaris when it moved to be “pidgin” have returned in Nevada build 92. I'm talking about “purple-remote” which is the program that replaces “gaim-remote” and thus allows me once again to set my away message using “utaction” so when I disconnect from my Sun Ray session my IM status is automatically set as well.
If you take the script that I wrote last time and do a global edit changing “gaim-remote” to be “purple-remote” it will work. Something I realise now but did not then was that you only need one ut-action command to handle both connection and disconnection so this will do it:
utaction -d "purple-remote 'setstatus?status=away&message=Away from Sun Ray'" -c "${HOME}/bin/sh/ut-where"
Friday June 20, 2008 After doing my second ZFS to ZFS live upgrade on a laptop I realise I will be starting to test grub's ability to handle lots of different boot targets in it's boot menu:

Already I can see that grub has a scrolling feature I had never seen before!
Wednesday June 18, 2008 Interesting change when installing snv_91 over the net from earlier releases. There is a very considerable delay with the “spinning bar” running here:
Sun Fire V440, No Keyboard Copyright 1998-2004 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.16.1, 16384 MB memory installed, Serial #55495765. Ethernet address 0:3:ba:4e:cc:55, Host ID: 834ecc55. Rebooting with command: boot net Boot device: /pci@1c,600000/network@2 File and args: /pci@1c,600000/network@2: 100 Mbps full duplex link up Timeout waiting for ARP/RARP packet 4000 /pci@1c,600000/network@2: 100 Mbps full duplex link up
Previously you would get the SunOS version message quite quickly.
SunOS Release 5.11 Version snv_91 64-bit
Now it takes many, more than five, minutes (with the 100Mbps link) to load over NFS. So be patient!
Monday June 16, 2008 Not wanting to break with tradition this weekend, being the weekend of Father's Day meant “Camp Dads”. A smaller number of campers but no less enjoyable weekend. Seven Dads, thirteen children and what turned out to be an overly pessimistic weather forecast. The smaller group worked really well as the children were more likely to play as one group.
Apologies to the man on the beach who was hit by the ball (one of the children seemed to have a knack of hitting or throwing the ball at people by accident). Thanks to the family who decided not to pitch their tent in the middle of the “rounders field”.
We followed the now traditional format of on Saturday, Beach and then back to the Camp Site for take away Curry. Then on Sunday strike camp and into the forest to find a stream and have a picnic before we all returned home very tired.
Tuesday June 10, 2008 My first live upgrade from ZFS to ZFS was as boring as you could wish for.
# luactivate zfs91
System has findroot enabled GRUB
Generating boot-sign, partition and slice information for PBE <zfs90>
Generating boot-sign for ABE <zfs91>
Generating partition and slice information for ABE <zfs91>
Boot menu exists.
Generating direct boot menu entries for PBE.
Generating xVM menu entries for PBE.
Generating direct boot menu entries for ABE.
Generating xVM menu entries for ABE.
No more bootadm entries. Deletion of bootadm entries is complete.
GRUB menu default setting is unaffected
Done eliding bootadm entries.
**********************************************************************
The target boot environment has been activated. It will be used when you
reboot. NOTE: You MUST NOT USE the reboot, halt, or uadmin commands. You
MUST USE either the init or the shutdown command when you reboot. If you
do not use either init or shutdown, the system will not boot using the
target BE.
**********************************************************************
In case of a failure while booting to the target BE, the following process
needs to be followed to fallback to the currently working boot environment:
1. Boot from Solaris failsafe or boot in single user mode from the Solaris
Install CD or Network.
2. Mount the Parent boot environment root slice to some directory (like
/mnt). You can use the following command to mount:
mount -Fzfs /dev/dsk/c0d0s0 /mnt
3. Run <luactivate> utility with out any arguments from the Parent boot
environment root slice, as shown below:
/mnt/sbin/luactivate
4. luactivate, activates the previous working boot environment and
indicates the result.
5. Exit Single User mode and reboot the machine.
**********************************************************************
Modifying boot archive service
Activation of boot environment <zfs91> successful.
#
init 6
#See all very dull. After it rebooted:
: pearson FSS 8 $; ssh sigma-wired Last login: Tue Jun 10 12:51:59 2008 from pearson.thegerh Sun Microsystems Inc. SunOS 5.11 snv_91 January 2008 : sigma TS 1 $; su - kroot Password: # lustatus Boot Environment Is Active Active Can Copy Name Complete Now On Reboot Delete Status -------------------------- -------- ------ --------- ------ ---------- ufs90 yes no no yes - zfs90 yes no no yes - zfs91 yes yes yes no - #
Although I'm not sure I like this:
# zfs list -r tank/ROOT NAME USED AVAIL REFER MOUNTPOINT tank/ROOT 7.90G 8.81G 18K /export/ROOT tank/ROOT@zfs90 17K - 18K - tank/ROOT/zfs90 4.94M 8.81G 5.37G /.alt.tmp.b-uK.mnt/ tank/ROOT/zfs91-notyet 7.89G 8.81G 5.39G / tank/ROOT/zfs91-notyet@zfs90 70.5M - 5.37G - tank/ROOT/zfs91-notyet@zfs91-notyet 63.7M - 5.37G - #
I have got used to renaming my exising BE to be nvXX-notyet and then upgrading that. So with ZFS I created a BE called zfs91-notyet upgraded that and then renamed it back. It seems that the renaming of a BE does not rename the underlying filesystems. Easy to work around but is it a bug?
Monday June 09, 2008 On Saturday I rode “La Ventoux, Beaumes de Venise” which involved riding 170km with more than 3000m of climbing. Now why would you want to do this? Here is why:
This is the second and shorter of the two “big” descents of the day, dropping from Chalet Raynard down to Bedoin. By the time I was descending this I was most certainly in a “saving energy” mode but on the upside this descent did not start in the clouds.
Before the ride there was much talk of how hard climbing Ventoux was going to be. Ventoux has a formidable reputation not least because Tom Simpson died climbing it in the 1967 Tour. It is a strange climb for at least two reasons.
The road was built not for horse and cart but for motor vehicles. So the roads are wide extremely good quality but steep. You can see the quality of the road from the video. Most of that descent was over 40mph and the top speed of 48mph somewhere. All this on a bike with 100psi tyres yet unlike some of the videos in England the film is not vibrating to much.
It goes to the top of the mountain. Most cols go to a pass which is the lowest point in a mountain range.
Then the area is renowned for it's strong winds. We were buffeted the whole day by a strong wind from a North East.
Every bike we saw, of which we saw plenty, had a triple chain set and those with only double chain sets mostly had “compact” chain sets. The small rings had 34 to 36 teeth and there were plenty of 27 tooth rear sprockets in evidence. I therefore got increasingly worried that my 39 tooth small ring and 25 tooth big sprocket were not going to give a low enough bottom gear. Needless to say the riders I was with spotted my concern and played on it (cyclists are like that, it's not personal).
On the day after an initial panic with one of the riders I was with having a split tyre just before the start we were off at about 8:40 and we rode to the base of the main climb at Bedoin. I was still with one of my friends at the bottom of the climb but only briefly as I clicked into my bottom gear and I assume he clicked into his which was lower. As so often happens to me after the first 1km I thought as long as it did not get much steeper I could keep doing this for quite a while. Getting to Chalet Raynard and above the trees the wind started to play it's part which initially allowed me to slip up a few cogs as I was blown up the hill. However on the next bend it was first gear again for the hill and the headwind and the added fun of going into the clouds which completely obscured the view of everything.
We had driven up here on the Thursday when it had been equally cloudy to amongst other things pay our respects at the Memorial to Tom Simpson, so I had some idea that when I reached the Memorial was about 1km to go. Since Cafe Raynard I had been contemplating whether to stop to put on my waterproof jacket for the decent or whether to just tough it out. Going over the top I decided to do the sensible thing and put on the jacket After all I had dragged it all the way up why not use it. This turned out to be a very wise decision.
The start of the descent was quite horrible. The lack of visibility, the damp road, the wind, the cold and the other riders streaming past all made for an unpleasant and quite scary ride. After a while however I was below the cloud and started to feel happier streaming down the wide smooth tarmac. The pangs of hunger though were beginning but the idea of eating while doing 40+mph being buffeted by crosswinds down a road that had tight bends and that I did not know did not seem wise.
On reaching the bottom I could feel that the food situation was more desperate and so ate the banana that I had had stuffed down the front of my jersey almost immediately after eating that I came to a food stop where I got some cake and more drink.
The next section of the ride was according to the profile, flat. However the profile was dominated by ventoux so failed to show that in fact there were a number of short climbs and descents that were sapping my remaining strength. All the while my GPS was forming a double torture by telling me I was less than half way round and the elevation meant I had over 1000m of up hill before I got to back to Cafe Raynard.
Just as I thought it could not get worse the rider who I had last seen at the bottom of the climb called out my name. Now this is not a race but it is competitive. I felt sure that if he had the legs to catch me then he would fly past me and then be able to spin his lower gear up the next climb while I struggled to try and stay in contact. We ended up in a group of six or so riders cycling up the river valley with sitting in second place wondering when the inevitable would happen and I would be spat out the back. When my friend went to the front the pace increased, the other riders all rode past me and I was spat out the back. Then the gap stopped growing with me about 50 yards back.
I decided I may as well make one final effort to bridge the gap as then at least I would get some shelter and to my surprise I managed that quite easily. As the road turned upwards more severely I found myself passing the other riders until as the climb for real started I was off the front and feeling good.
The second climb of Ventoux was very very much easier than the first. The road was much less steep and so when the wind was in my face I was doing 10mph and when it was behind I was doing 18mph. There were some other British Riders with “Elite Cycling” shirts on so in the last 2km I used one as a target to catch which while taking the long view was the wrong thing to do did have a certain pleasure when I got onto the big ring and hit 28mph uphill in the last km of the climb (yes I had a tail wind, but I can dream I'm a cycling god).
Then that descent.
The final section was into the wind and again the profile showed as flat but in fact contained three significant (ie bigger than anything we get in Surrey) climbs which resulted in me crawling up them with the constant fear I was going to be caught again.
The final five km however were both downhill and with the wind behind me so I was able to fly down the road at over 30mph waved through the junctions by the marshals as at all the other junctions.
I managed to finish with a time of 8 hours 4 minutes 13 seconds for the 170km earning me a Silver certificate.
My GPS disagrees about the distance as does MapMy ride. The GPS claimed 105 miles, MapMyRide claims 99.67miles but it also shows that some of the corners were cut. Cutting those corners for real would involve a significant fall!
Brilliant ride well organised.
Monday June 02, 2008 Today we were apparently going on a short flat 50 mile ride as a final preparation for our trip to Ventoux next week. The flat bit was true but we ended up doing 85 miles, which I don't describe as "short".
A really nice route though along some new lanes and some lanes we don't often frequent.
I'm still trying to find a good site for uploading my ride GPS data to. mapmyride.com seems to have issues now displaying ride data for long rides to solaris hosts and so far my requests for help have fallen on deaf ears. So I'm now experimenting with http://www.gpsvisualizer.com which appears to work and also allows me to host the data here. However it lacks some of the features of mapmyride.com, distance markers & simple elevation so I'm still looking for a better option.
Friday May 30, 2008 I had reason to want to be able to decode scsi_pkt structures in dtrace again this week and it struck me that that is really what scsi.d does. So I went about modifying it to be able to translate the structures in the more general case. The result is that it is now moderately simple to get it to decode a scsi_pkt structure in any routine that can access them.
If that is the boat you are into, and clearly I am, then download the latest scsi.d script (version 1.13 or later) and then look for the comments “FRAMEWORK” to show you where you need to add your code.
I've also added a few more SCSI commands to the commands that scsi.d will understand. These are mostly from the set of commands used by tape devices.
As an example I have left the probe I was interested in in the script, which was why we are seeing ssd reporting illegal requests. Which command does the target not like?
01054.352580740 scsi_vhci0:ILL 0x1d SEND_DIAGNOSTIC address 00:00, lba 0x100100, len 0x000010, control 0x00 timeout 300 CDBP 6002a2681c0 sched(0) cdb(6) 1d1001001000
Then it is a simple matter to explain why this is not a problem. Get the monitoring software that is generating these commands to not talk to the targets that don't understand them.
Tuesday May 27, 2008 It took a bit of work but I managed to pursuade my old laptop to live upgrade to nevada build 90 with ZFS root. First I upgraded to build 90 on ufs and then created a BE on zfs. The reason for the two step approach was to reduce the risk a bit. Bear in mind this is all new in build 90 and I am not an expert on the inner workings of live upgrade. So there are no guarantees.
The upgrade failed at the last minute with this error:
ERROR: File </boot/grub/menu.lst> not found in top level dataset for BE <zfs90> ERROR: Failed to copy file </boot/grub/menu.lst> from top level dataset to BE <zfs90> ERROR: Unable to delete GRUB menu entry for boot environment <zfs90>. ERROR: Cannot make file systems for boot environment <zfs90>.
This bug has already been filed (6707013 LU fail to migrate root file system from UFS to ZFS)
However lustatus said all was well so I tried to activate it:
# lustatus Boot Environment Is Active Active Can Copy Name Complete Now On Reboot Delete Status -------------------------- -------- ------ --------- ------ ---------- ufs90 yes yes yes no - zfs90 yes no no yes - # luactivate zfs90 System has findroot enabled GRUB Generating boot-sign, partition and slice information for PBE <ufs90> ERROR: No such file or directory: cannot stat </etc/lu/ICF.2> ERROR: cannot use </etc/lu/ICF.2> as an icf file ERROR: Unable to mount the boot environment <zfs90>. #
No joy. Can I mount it?
# lumount -n zfs90 ERROR: No such file or directory: cannot open </etc/lu/ICF.2> mode <r> ERROR: individual boot environment configuration file does not exist - the specified boot environment is not configured properly ERROR: cannot access local configuration file for boot environment <zfs90> ERROR: cannot determine file system configuration for boot environment <zfs90> ERROR: No such file or directory: error unmounting <tank/ROOT/zfs90> ERROR: cannot mount boot environment by name <zfs90> #
With nothing to loose I copied the ICF file for the UFS BE and edited to look like what I suspected one for a ZFS BE would look like. I got lucky as I was right!
# ls /etc/lu/ICF.1
/etc/lu/ICF.1
# cat /etc/lu/ICF.1
ufs90:/:/dev/dsk/c0d0s7:ufs:19567170
# cp /etc/lu/ICF.1 /etc/lu/ICF.2
# vi /etc/lu/ICF.2
# cat /etc/lu/ICF.2
zfs90:/:tank/ROOT/zfs90:zfs:0
# lumount -n zfs90
/.alt.zfs90
# df
/ (/dev/dsk/c0d0s7 ): 1019832 blocks 740833 files
/devices (/devices ): 0 blocks 0 files
/dev (/dev ): 0 blocks 0 files
/system/contract (ctfs ): 0 blocks 2147483616 files
/proc (proc ): 0 blocks 9776 files
/etc/mnttab (mnttab ): 0 blocks 0 files
/etc/svc/volatile (swap ): 1099144 blocks 150523 files
/system/object (objfs ): 0 blocks 2147483395 files
/etc/dfs/sharetab (sharefs ): 0 blocks 2147483646 files
/dev/fd (fd ): 0 blocks 0 files
/tmp (swap ): 1099144 blocks 150523 files
/var/run (swap ): 1099144 blocks 150523 files
/tank (tank ):24284511 blocks 24284511 files
/tank/ROOT (tank/ROOT ):24284511 blocks 24284511 files
/lib/libc.so.1 (/usr/lib/libc/libc_hwcap1.so.1): 1019832 blocks 740833 files
/.alt.zfs90 (tank/ROOT/zfs90 ):24284511 blocks 24284511 files
/.alt.zfs90/var/run(swap ): 1099144 blocks 150523 files
/.alt.zfs90/tmp (swap ): 1099144 blocks 150523 files
# luumount zfs90
# luactivate zfs90
System has findroot enabled GRUB
Generating boot-sign, partition and slice information for PBE <ufs90>
diff: /.alt.tmp.b-svc.mnt/etc/lu/synclist: No such file or directory
Generating boot-sign for ABE <zfs90>
ERROR: File </etc/bootsign> not found in top level dataset for BE <zfs90>
Generating partition and slice information for ABE <zfs90>
Boot menu exists.
Generating direct boot menu entries for PBE.
Generating xVM menu entries for PBE.
Generating direct boot menu entries for ABE.
Generating xVM menu entries for ABE.
No more bootadm entries. Deletion of bootadm entries is complete.
GRUB menu default setting is unaffected
Done eliding bootadm entries.
**********************************************************************
The target boot environment has been activated. It will be used when you
reboot. NOTE: You MUST NOT USE the reboot, halt, or uadmin commands. You
MUST USE either the init or the shutdown command when you reboot. If you
do not use either init or shutdown, the system will not boot using the
target BE.
**********************************************************************
In case of a failure while booting to the target BE, the following process
needs to be followed to fallback to the currently working boot environment:
1. Boot from Solaris failsafe or boot in single user mode from the Solaris
Install CD or Network.
2. Mount the Parent boot environment root slice to some directory (like
/mnt). You can use the following command to mount:
mount -Fufs /dev/dsk/c0d0s7 /mnt
3. Run <luactivate> utility with out any arguments from the Parent boot
environment root slice, as shown below:
/mnt/sbin/luactivate
4. luactivate, activates the previous working boot environment and
indicates the result.
5. Exit Single User mode and reboot the machine.
**********************************************************************
Modifying boot archive service
Activation of boot environment <zfs90> successful.
#Fixing boot sign
#file /etc/bootsign
/etc/bootsign: ascii text
# cat /etc/bootsign
BE_ufs86
BE_ufs90
# vi /etc/bootsign
# lumount -n zfs90 /a
/a
# cat /a/etc/bootsign
cat: cannot open /a/etc/bootsign: No such file or directory
# cat /a/etc/bootsign
cat: cannot open /a/etc/bootsign: No such file or directory
# cp /etc/bootsign /a/etc
# vi /a/etc/bootsign
# cat /a/etc/bootsign
BE_zfs90
#
# luumount /a
# luactivate ufs90
System has findroot enabled GRUB
Generating boot-sign, partition and slice information for PBE <ufs90>
Activating the current boot environment <ufs90> for next reboot.
The current boot environment <ufs90> has been activated for the next reboot.
# luactivate zfs90
System has findroot enabled GRUB
Generating boot-sign, partition and slice information for PBE <ufs90>
diff: /.alt.tmp.b-hNc.mnt/etc/lu/synclist: No such file or directory
Generating boot-sign for ABE <zfs90>
Generating partition and slice information for ABE <zfs90>
Boot menu exists.
Generating direct boot menu entries for PBE.
Generating xVM menu entries for PBE.
Generating direct boot menu entries for ABE.
Generating xVM menu entries for ABE.
No more bootadm entries. Deletion of bootadm entries is complete.
GRUB menu default setting is unaffected
Done eliding bootadm entries.
**********************************************************************
The target boot environment has been activated. It will be used when you
reboot. NOTE: You MUST NOT USE the reboot, halt, or uadmin commands. You
MUST USE either the init or the shutdown command when you reboot. If you
do not use either init or shutdown, the system will not boot using the
target BE.
**********************************************************************
In case of a failure while booting to the target BE, the following process
needs to be followed to fallback to the currently working boot environment:
1. Boot from Solaris failsafe or boot in single user mode from the Solaris
Install CD or Network.
2. Mount the Parent boot environment root slice to some directory (like
/mnt). You can use the following command to mount:
mount -Fufs /dev/dsk/c0d0s7 /mnt
3. Run <luactivate> utility with out any arguments from the Parent boot
environment root slice, as shown below:
/mnt/sbin/luactivate
4. luactivate, activates the previous working boot environment and
indicates the result.
5. Exit Single User mode and reboot the machine.
**********************************************************************
moModifying boot archive service
Activation of boot environment <zfs90> successful.
# lumount -n zfs90 /a
/a
# cat /a/etc/bootsign
BE_zfs90
# luumount /a
# init 6The system now booted off the ZFS pool. Once up I just had to see if I could create a second ZFS be as a clone of the first and if so haw fast was this.
# df / / (tank/ROOT/zfs90 ):23834562 blocks 23834562 files # lustatus Boot Environment Is Active Active Can Copy Name Complete Now On Reboot Delete Status -------------------------- -------- ------ --------- ------ ---------- ufs90 yes no no yes - zfs90 yes yes yes no - # time lucreate -p tank -n zfs90.2 Checking GRUB menu... System has findroot enabled GRUB Analyzing system configuration. Comparing source boot environment <zfs90> file systems with the file system(s) you specified for the new boot environment. Determining which file systems should be in the new boot environment. Updating boot environment description database on all BEs. Updating system configuration files. Creating configuration for boot environment <zfs90.2>. Source boot environment is <zfs90>. Creating boot environment <zfs90.2>. Cloning file systems from boot environment <zfs90> to create boot environment <zfs90.2>. Creating snapshot for <tank/ROOT/zfs90> on <tank/ROOT/zfs90@zfs90.2>. Creating clone for <tank/ROOT/zfs90@zfs90.2> on <tank/ROOT/zfs90.2>. Setting canmount=noauto for </> in zone <global> on <tank/ROOT/zfs90.2>. No entry for BE <zfs90.2> in GRUB menu Population of boot environment <zfs90.2> successful. Creation of boot environment <zfs90.2> successful. real 0m38.40s user 0m6.89s sys 0m11.59s #
38 seconds to create a BE, something that would take over and hour with UFS.
I'm not foolish brave
enough to do the home server yet so that is on nv90 with UFS. When
the bug is fixed I'll give it a go.
Friday May 23, 2008 I love getting emails like this:
Date: Fri, 23 May 2008 10:47:56 +0100 From: XXX YYY <XXX.YYY@Sun.COM> Subject: Guess what guys...? To: Chris Gerhard <Chris.Gerhard@Sun.COM>, ....... ... We ran out of cycle lockers on Monday! ALL 20 of them!
Clearly the solution is more lockers so as to not suppress demand.
Wednesday May 21, 2008 One problem with the automounter is that when you use the /net mount points to mount a server if the admin on that server adds a share then you client won't see that share until the automounter timesout the mount. This obviously requires that the mounts are unused which for a large nfs server could never happen.
So given an NFS server host called sa64-zfs-gmp03.eu which is sharing a directory /newpool/cjg on a client you can do:
# ls /net/sa64-zfs-gmp03.eu/newpool cjg # ls /net/sa64-zfs-gmp03.eu/newpool/cjg SPImage ipmiLog ppcenv sel.bin tmp SPValueAdd mcCpu0Core0Log processLog summaryLog evLog mcCpu1Core0Log prsLog swLog hwLog mcCpu2Core0Log pstore tdulog.tar # cd /net/sa64-zfs-gmp03.eu/newpool/cjg # ls SPImage ipmiLog ppcenv sel.bin tmp SPValueAdd mcCpu0Core0Log processLog summaryLog evLog mcCpu1Core0Log prsLog swLog hwLog mcCpu2Core0Log pstore tdulog.tar
However if at this point on the server you create and share a new file system:
# zfs create -o sharenfs=rw newpool/cjg2 # share -@newpool/cjg /newpool/cjg rw "" -@newpool/cjg2 /newpool/cjg2 rw "" # echo foo > /newpool/cjg2/file #
You can't now directly access it on the client:
# ls /net/sa64-zfs-gmp03.eu/newpool/cjg2 /net/sa64-zfs-gmp03.eu/newpool/cjg2: No such file or directory #
Now we all know you can work around this by using aliases for the server or even different capitalization:
# ls /net/SA64-zfs-gmp03.eu/newpool/cjg2 file #
however lots of users just won't buy that and I don't blame them.
With the advent or mirror mounts to NFSv4 you can do a lot better and there is an RFE (4107375) for the automounter to do this for you, which looks like it would be simple on a client that can do mirror mounts but until that is done here is a work-around. Create a file “/etc/auto_mirror “that contains this line:
* &:/
Then add this line to auto_master:
/mirror auto_mirror -nosuid,nobrowse,vers=4
or add a new key to an existing automount table:
: s4u-nv-gmp03.eu TS 50 $; nismatch mirror auto_share mirror / -fstype=autofs,nosuid,nobrowse auto_mirror.org_dir.cte.sun.com. : s4u-nv-gmp03.eu TS 51 $;
Now if we do the same test this time replacing the “/net” path with the “/mirror” path you get:
# ls /mirror/sa64-zfs-gmp03.eu/newpool/ cjg # ls /mirror/sa64-zfs-gmp03.eu/newpool/cjg SPImage ipmiLog ppcenv sel.bin tmp SPValueAdd mcCpu0Core0Log processLog summaryLog evLog mcCpu1Core0Log prsLog swLog hwLog mcCpu2Core0Log pstore tdulog.tar # (cd /mirror/sa64-zfs-gmp03.eu/newpool/cjg ; sleep 1000000) & [1] 10455 # ls /mirror/sa64-zfs-gmp03.eu/newpool/cjg2 /mirror/sa64-zfs-gmp03.eu/newpool/cjg2: No such file or directory
Here I created the new file system on the server and put the file in.
# ls /mirror/sa64-zfs-gmp03.eu/newpool/cjg2 file #
If you are an entirely NFSv4 shop then you could change the “/net” mount point to use this.