Monday September 07, 2009 What do you do if you manage to delete or corrupt /etc/name_to_major? Assuming you don't have a backup a ZFS snapshot or an alternative boot environment, in which case you probably are in the wrong job, you would appear to be in trouble.
First thing is not to panic. Do not reboot the system. If you do that it won't boot and your day has just got a whole lot worse. The data needed to rebuild /etc/name_to_major is in the running kernel so it can be rebuilt from that. If your system an x86 system it is also in the boot archive.
However if you have no boot archive or have over written it with the bad name_to_system this script will extract it from the kernel, all be it slowly:
#!/bin/ksh i=0 while ((i < 1000 )) do print "0t$i::major2name" | mdb -k | read x && echo $x $i let i=i+1 done
1Redirect that into a file then move the remains of your /etc/name_to_major out of the way and copy the file in place.
Next time make sure you have a back up or snapshot or alternative boot environment!
1You will see lots of errors of the form “mdb: failed to convert major number to name” these are to be expected. They can be limited to just one by adding “|| break” to the mdb line but that assumes that you have no holes in the major number listings which you may have if you have removed a device, so best to not risk that.
Thursday August 27, 2009 Someone has posted a script to start a remote xterm on BigAdmin which exposes a number of issues I thought it would be better if google stood some chance of finding a better answer or at least an answer that does not rely on inherently insecure settings.
Remote X applications should be started using ssh -X so that the X traffic is encrypted and if you add -C compressed which can be a significant performance boost. So a script to do this could be handy although to be honest knowing the ssh options or having them set as the default in your .ssh/config is just as easy:
: exdev.eu FSS 31 $; egrep '^(Compress|ForwardX)' ~/.ssh/config ForwardX11 yes Compression yes : exdev.eu FSS 32 $; ssh -f pearson /usr/X11/bin/xterm : exdev.eu FSS 33 $;
or more usefully to start graphical tools:
: exdev.eu FSS 33 $; ssh -f pearson pfexec /usr/sadm/admin/bin/dhcpmgr : exdev.eu FSS 34 $;
However if you really want a script to do it here is one that will and no need to mess with your .ssh/config
#!/bin/ksh
REMOTE_PATH=${REMOTE_PATH:-${PATH}}
APP=${0##*/}
if (( $# < 1 ))
then
print "USAGE: ${APP} host [args]" >&2
exit 1
fi
host=$1
shift
exec /usr/bin/ssh -o ClearAllForwardings=yes -C -Xfn $host \
PATH=${REMOTE_PATH} pfexec ${APP#r} $@If you save this into a file called “rxterm” then running “rxterm remotehost” will start an xterm on the system remotehost assuming you can ssh to that system.
More entertainingly you can save it as “rdhcpmgr” and it will start the dhcpmgr program on a remote system and securely display it on your current display (assuming your PATH includes /usr/sadm/admin/bin and your profile allows you access to that application). You can use it to start any application by simple naming it after the application in question with a preceding “r”.
Tuesday August 04, 2009 Many databases get backed up by simply stopping the database copying all the data files and then restarting the database. This is fine for things that don't require 24 hour access. However if you are concerned about the time it takes to take the back up then don't do this:
stop_database cp /data/file1.db . gzip file1.db cp /data/file2.db . gzip file2.db start_database
Now there are many ways to improve this using ZFS and snapshots being one of the best but if you don't want to go there then at the very least stop doing the “cp”. It is completely pointless. The above should just be:
stop_database gzip < /data/file1.db > file1.db gzip < /data/file2.db > file2.db start_database
You can continue to make it faster by backgrounding those gzips if the system has spare capacity while the back up is running but that is another point. Just stopping those extra copies will make life faster as they are completely unnecessary.
Monday April 07, 2008 I previously mentioned about modifying an underlying mirror. So if you have booted from CDROM (yes I know they are all DVDs now but at least I've stopped saying “tape”) or the network then here is how on Solaris 91 and above.
First get a copy of the /kernel/drv/md.conf file. Since mounting a file system in this case will result in rolling the log, even for a read-only mount, this actually breaks my rule. Which is why it is wise to keep a copy of the md.conf file somewhere safe or failing that on that USB pen drive that you have dropped behind the sofa. It will be in the back up of the root file system you have.
# ufsrestore xf cg13442@1.2.3.4:/backup/root.dump kernel/drv/md.conf Warning: ./kernel: File exists Warning: ./kernel/drv: File exists You have not read any volumes yet. Unless you know which volume your file(s) are on you should start with the last volume and work towards the first. Specify next volume #: 1 set owner/mode for '.'? [yn] n Directories already exist, set modes anyway? [yn] n #
If you have, like I have at home, backed up your root file system into your ZFS pool you can have a quick demonstration as ZFS gets this right when you get the md.conf, you just import the pool. You have to use an alternative root as the root is read-only so it can't create /tank:
# zpool import -R /tmp tank # ufsrestore xf /tmp/tank/backup/root kernel/drv/md.conf Warning: ./kernel: File exists Warning: ./kernel/drv: File exists You have not read any volumes yet. Unless you know which volume your file(s) are on you should start with the last volume and work towards the first. Specify next volume #: 1 set owner/mode for '.'? [yn] n Directories already exist, set modes anyway? [yn] n #
Now run update_drv(1M) to load the new md.conf and you are away.
# update_drv md devfsadm: mkdir failed for /dev 0x1ed: Read-only file system
That is it. You can now access your meta devices:
# metastat
d10: Mirror
Submirror 0: d11
State: Needs maintenance
Submirror 1: d12
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 70078473 blocks (33 GB)
d11: Submirror of d10
State: Needs maintenance
Invoke: metasync d10
Size: 70078473 blocks (33 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s0 0 No Okay Yes
d12: Submirror of d10
State: Needs maintenance
Invoke: metasync d10
Size: 70078473 blocks (33 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 0 No Okay Yes
d20: Mirror
Submirror 0: d21
State: Needs maintenance
Submirror 1: d22
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 1022706 blocks (499 MB)
d21: Submirror of d20
State: Needs maintenance
Invoke: metasync d20
Size: 1022706 blocks (499 MB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s1 26001 Yes Okay Yes
d22: Submirror of d20
State: Needs maintenance
Invoke: metasync d20
Size: 1022706 blocks (499 MB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s1 26001 Yes Okay Yes
Device Relocation Information:
Device Reloc Device ID
c1t1d0 Yes id1,sd@SFUJITSU_MAP3367N_SUN36G_00N024DA____
c1t0d0 Yes id1,sd@SFUJITSU_MAP3367N_SUN36G_00N022FA____
#There is a document in the service database formally know as SunSolve #202794 (Previously Published As 75210) which claims you need to unload the md driver for Solaris 10 you don't. I am updating that document.
1At the time of publishing I have not verified this on Solaris 9 but I think it should work. I have clearly verified it on 10! When I have verified it I will update this post. Update: I can verify this works on Solaris 9
Sunday April 06, 2008 I was recently asked what the home server serves. So here is the list:
NAS server. NFS and CIFS (via SAMBA). There is a single Windows system in the house which is increasingly not switched on. NFS for the two laptops that frequent the network. All supported via ZFS on two 400Gb drives with literally thousands of snapshots,44170. Space is beginning to get short thanks to the 10Mega pixel SLR camera so in the not to distant future a disk upgrade will be required.
Sun Ray server. There are (currently) three Sun Rays. One acts as a photo frame and has no keyboard or mouse. The other two provide real interactive use. I can foresee a situation where we have two more Sun Rays.
Email server. SMTP and IMAP via exim and imapd respectively. Clearly this implies spamassassin and and antivirus scanner, clamAV.
SlimServer. I've just run up a slim server to get better access to internet radio stations. Having a radio player that I can hook up to the hi-fi that is not DAB, ie crap1, would be good. I feel a squeezebox coming soon.
Just occasionally and every time I ran up VirtualBox the system would struggle to cope prior to the CPU upgrade even when using the Fair Share Schedler. Since the upgrade it has not had any problems with having us all using it.
1It is nice to see that I am not alone in realising DAB is crap.
Friday April 04, 2008 One of the great benefits of running Sun Rays at home is having the sessions always there. Just plug in the card and you get your session as if you were never away. However that also allows you to leave an application chewing CPU cycles when you are away. So to keep the interactive experience as good as possible I employ the same techniques described in “Using Solaris Resource Manager With Sun Ray” blueprint. For a long while I've wondered why IT don't do this. The keepers of our Sun Ray do and it works a treat. Which is a good thing when you share a Sun Ray Server with Tim.
Instead of setting the number of shares up to a specific value I use a multiplier so that those active on a Sun Ray get 10 times the number of shares that they would by default. While this works well it still leaves a significant load on the system from certain applications, specifically flash animations that are left running endlessly playing the games that were being played when the users card was removed. The fair share scheduler does it's thing to make CPU allocation fair but the memory use of those otherwise idle firefox sessions is significant.
So I've taken a leaf out of the BOFH and apply some special sanctions to those processes. Alas I may not get a job with the BOFH as my sanctions are simply to pstop(1) the copies of firefox associated with the user and DISPLAY when they detach and then prun(1) them when the user reconnects. I wondered about using memory resource caps to limit the memory but that would leave the systems rcapd(1M) battling the memory usage of the firefox processes which are not displaying anything anyway. In the unlikely event that any of the users are using their firefox sessions to simulate nuclear fission or crack SSL so would rather they kept running I'm sure they will get back to me.
So the script I have for doing this is slightly more complex than the one from the Blueprint. Since it has to err on the side of caution when stopping users firefox sessions. To do that it uses pargs(1) to make sure that the firefox sessions are really for this display. In practice I am the only person who might remote display a firefox session from here and even that is unlikely but it is the principle. The impact on the system of not trying to run all the disconnected firefox sessions is amazing.
Thursday April 03, 2008 At long last my home directory in the Office has caught up with my home directory at home and the one on my laptop and now lives on ZFS. Even better the admins have delegated snapshot privileges for my home directory to me. So now I have a scrip that snapshots my home directory every time I insert my smart card:
#!/bin/ksh -p now=$(date +%F-%T) exec mkdir $HOME/.zfs/snapshot/user_snap_$now
This is then called using utaction:
utaction -c ~/bin/sh/snap
Which is in turn started automatically via the session magic that gnome does (Preferences->Sessions->Start Up Programs).
You will notice that I use mkdir to create the snapshot this is great as it allows me to run the script on an NFS client but does prevent me from doing a recursive snapshot which if I had other file systems I would like.
Update. I just realised that my nautilus script is now useful at work. Cool.
Thursday March 27, 2008 I bit the bullet and bought a new CPU for the home server. It now has an AMD Athlon 64 X2 5000+ Socket AM2 2.6GHz Energy Efficient L2 1MB (2x512KB):
: pearson FSS 2 $; /usr/sbin/psrinfo -v
Status of virtual processor 0 as of: 03/27/2008 08:00:38
on-line since 03/27/2008 07:47:52.
The i386 processor operates at 2600 MHz,
and has an i387 compatible floating point processor.
Status of virtual processor 1 as of: 03/27/2008 08:00:38
on-line since 03/27/2008 07:48:00.
The i386 processor operates at 2600 MHz,
and has an i387 compatible floating point processor.
: pearson FSS 3 $; So far so good. Obviously power now no longer works so this is running at full power all the time, which is less than ideal but the performance should be and so far is considerably better than the single 2.2GHz CPU it replaces.
With the exception of PowerNow which is not supported on this Dual Core CPU, Solaris works flawlessly as expected.
Tuesday March 25, 2008 One of my users today had a bit of a hissy fit today when she plugged in her USB thumb drive into the Sun Ray and it did nothing. That is it did nothing visible. Behind the scenes the drive had been mounted somewhere but there was no realistic way she could know this.
So I need a way to get the file browser to open when the drive is inserted. A quick google finds " "USB Drive" daemon for Sun Ray sessions" which looks like the answer. The problem I have with this is that it polls to see if there is something mounted. Given my users never log out this would mean this running on average every second. Also the 5 second delay just does not take into account the attention span of a teenager.
There has to be a better way.
My solution is to use dtrace to see when the file system has been mounted and then run nautilus with that directory.
The great thing about Solaris 10 and later is that I can give the script just the privilege that allows it to run dtrace without handing out access to the world. Then of course you can then give that privilege away.
So I came up with this script. Save it. Mine is in /usr/local which in turn is a symbolic link to /tank/fs/local. Then add an entry to /etc/security/exec_attr, subsisting the correct absolute (ie one with no symbolic links in it) path in the line.
Basic Solaris User:solaris:cmd:::/tank/fs/local/bin/utmountd:privs=dtrace_kernel
This gives the script just enough privileges to allow it to work. It then drops the extra privilege so that when it runs nautilus it has no extra privileges.
Then you just have to arrange for users to run the script when they login using:
pfexec /usr/local/bin/utmountd
I have done this by creating a file called /etc/dt/config/Xsession.d/utmountd that contains these lines:
pfexec /usr/local/bin/utmountd & trap "kill $!" EXIT
I leave making this work for uses of CDE as an exercise for the reader.
Tuesday March 18, 2008 Following on from “When to run fsck” and “When to run quotacheck” here is another:
When to modify the individual sub mirrors that make up a mirrored volume?
Answer: Never.
With the Logical volume manger in Solaris you can build a mirror from two sub mirrors:
# metastat d0
d0: Mirror
Submirror 0: d10
State: Okay
Submirror 1: d11
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 20482875 blocks (9.8 GB)
d10: Submirror of d0
State: Okay
Size: 20482875 blocks (9.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1d0s0 0 No Okay Yes
d11: Submirror of d0
State: Okay
Size: 20482875 blocks (9.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c5d0s0 0 No Okay Yes
Device Relocation Information:
Device Reloc Device ID
c1d0 Yes id1,cmdk@AST3320620AS=____________3QF09GL1
c5d0 Yes id1,cmdk@AST3320620AS=____________3QF0A1QD
#
So here we have the mirror “d0” made up of devices “d10” and “d11”. Each of these devices can be addressed in the file system as /dev/md/rdsk/d0 /dev/md/rdsk/d10 and /dev/md/rdsk/d11 respectively. The block devices are also available if you so desire. While being able to address the underlying disk devices that make up a mirror is interesting and potentially useful it is only useful if you really know what you are doing.
Reading from the mirrors is o.k. Writing and that includes just mounting the file system is not. So if the device is idle you can do:
# cmp /dev/md/rdsk/d10 /dev/md/rdsk/d11 #
Which if it returns 01 gives you a feeling of confidence, although if you are this paranoid, and I am, then ZFS is a much better bet.
For example if the mirror contains a file system then mounting one side of the mirror and making modifications is a really really bad idea, even if the mirror is unmounted. Once you have made such a modification you would have to make sure the other side of the mirror had exactly the same change at the block level propagated to it. Realistically the only way to achieve that is for you to detach the other mirror and then reattach it so allow it to resync. If you really know what you are doing there are tricks you could do but I suspect those that really know what they are doing would not get into this mess in the first place.
1 If it does not then you have to look at how the mirror was constructed before you start to worry. If you did “metainit d0 –m d10 d11” or have grown the metadevice then the mirrors will never have been brought into sync. So only the blocks that have been written to since the operation will correctly comapare. Hence this is nothing to worry about. See I told you you do really have to know what you are doing.
Tuesday March 11, 2008 After messing around with zones for a few minutes it became clear that it would be really useful if there was a zcp command that worked just like scp(1) but used zlogin as the transport rather than using ssh. For those cases when you are root and don't want to mess with ssh authorizations since you know you can zlogin without a password anyway.
Specifically I wanted to be able to do:
# zcp /etc/resolv.conf bookable-129-156-208-37.uk:/etc
Well it turns out that this is really easy to do. The trick is to let scp(1) do the heavy lifting for you and use zlogin(1) act as your transport. So I knocked together this script. You need to install it on your path called “zcp” and then make a hard link in the same directory called “zsh”. For example:
# /usr/sfw/bin/wget --quiet http://blogs.sun.com/chrisg/resource/zcp.sh # cp zcp.sh /usr/local/bin/zcp # ln /usr/local/bin/zcp /usr/local/bin/zsh # chmod 755 /usr/local/bin/zsh
Now the glorious simplicity of zcp, I'll even trhow in recursvice copy for free:
# zcp -r /etc/inet bookable-129-156-208-37.uk:/tmp ipqosconf.1.sample 100% |*****************************| 2503 00:00 config.sample 100% |*****************************| 3204 00:00 wanboot.conf.sample 100% |*****************************| 3312 00:00 hosts 100% |*****************************| 286 00:00 ipnodes 100% |*****************************| 286 00:00 netmasks 100% |*****************************| 384 00:00 networks 100% |*****************************| 372 00:00 inetd.conf 100% |*****************************| 1519 00:00 sock2path 100% |*****************************| 566 00:00 protocols 100% |*****************************| 1901 00:00 services 100% |*****************************| 4201 00:00 mipagent.conf-sample 100% |*****************************| 6274 00:00 mipagent.conf.fa-sam 100% |*****************************| 6232 00:00 mipagent.conf.ha-sam 100% |*****************************| 5378 00:00 ntp.client 100% |*****************************| 291 00:02 ntp.server 100% |*****************************| 2809 00:00 slp.conf.example 100% |*****************************| 5750 00:00 ntp.conf 100% |*****************************| 155 00:00 ntp.keys 100% |*****************************| 253 00:00 inetd.conf.orig 100% |*****************************| 6961 00:00 ntp.drift 100% |*****************************| 6 00:00 ipsecalgs 100% |*****************************| 920 00:00 ike.preshared 100% |*****************************| 308 00:00 ipseckeys.sample 100% |*****************************| 510 00:00 datemsk.ndpd 100% |*****************************| 22 00:00 ipsecinit.sample 100% |*****************************| 2380 00:00 ipaddrsel.conf 100% |*****************************| 545 00:00 inetd.conf.preupgrad 100% |*****************************| 6563 00:00 hosts.premerge 100% |*****************************| 112 00:00 ipnodes.premerge 100% |*****************************| 61 00:00 hosts.postmerge 100% |*****************************| 286 00:00 ipqosconf.2.sample 100% |*****************************| 3115 00:00 ipqosconf.3.sample 100% |*****************************| 1097 00:00 #
I'll file and RFE for this to go into Solaris and update this entry when I have the number.
Update: The Bug ID is 6673792. The script now also supports zsync and zdist although niether of those have been tested yet.
Wednesday February 27, 2008 Following on from the latency bubbles in your IO posting. I have been asked two questions about this post privately:
How can you map those long numbers in the output into readable entries, eg sd0.
How can I confirm that disksort has been turned off?
The first one just requires another glob of D:
##pragma D option quiet
#define SD_TO_DEVINFO(un) ((struct dev_info *)((un)->un_sd->sd_dev))
#define DEV_NAME(un) \
stringof(`devnamesp[SD_TO_DEVINFO(un)->devi_major].dn_name) /* ` */
#define DEV_INST(un) (SD_TO_DEVINFO(un)->devi_instance)
fbt:ssd:ssdstrategy:entry,
fbt:sd:sdstrategy:entry
{
bstart[(struct buf *)arg0] = timestamp;
}
fbt:ssd:ssdintr:entry,
fbt:sd:sdintr:entry
/ arg0 != 0 /
{
this->buf = (struct buf *)((struct scsi_pkt *)arg0)->pkt_private;
}
fbt:ssd:ssdintr:entry,
fbt:sd:sdintr:entry
/ this->buf /
{
this->priv = (struct sd_xbuf *) this->buf->b_private;
}
fbt:ssd:ssdintr:entry,
fbt:sd:sdintr:entry
/ this->priv /
{
this->un = this->priv->xb_un;
}
fbt:ssd:ssdintr:entry,
fbt:sd:sdintr:entry
/ this->buf && bstart[this->buf] && this->un /
{
@l[DEV_NAME(this->un), DEV_INST(this->un)] =
lquantize((timestamp - bstart[this->buf])/1000000, 0,
60000, 60000);
@q[DEV_NAME(this->un), DEV_INST(this->un)] =
quantize((timestamp - bstart[this->buf])/1000000);
bstart[this->buf] = 0;
}
The second required a little bit of mdb. Yes you can also get the same from dtrace mdb gives the the immediate answer, firstly for all the disks that use the sd driver and then for instance 1:
# echo '*sd_state::walk softstate | ::print -at "struct sd_lun" un_f_disksort_disabled' | mdb -k 300000ad46b unsigned un_f_disksort_disabled = 0 60000e23f2b unsigned un_f_disksort_disabled = 0 # echo '*sd_state::softstate 1 | ::print -at "struct sd_lun" un_f_disksort_disabled' | mdb -k 300000ad46b unsigned un_f_disksort_disabled = 0
Saturday February 09, 2008 At last I have handed over the cron changes to support different timezones to Darren who is sponsoring the effort. I've learned a lot in the process so far of trying to do this work from “outside” of Sun. Mostly that the time required to do even a very small project like this is very great and there are times when you can't just put it down if you are busy. This makes it very difficult when doing this in your own “spare” time and can lead to some spectacularly late nights. The other problems were around keeping a build system running at home. The sometimes long times between working on this resulted in considerable effort to keep up with the various flag days. I also had some tangles with mercurial that did not help.
The ARC case was quite painless even if there were elements of Bike Shed Syndrome in it with real dangers of even greater feature creep. Having actually experienced ARCs internally I was probably better prepared for this than a real external engineer.
I got some really great feed back during the code reviews which has resulted in a better end result.
Now I'm just sitting back and waiting.
Wednesday December 27, 2006 Not quite as often as seeing someone run fsck on a live UFS file system and then regretting it but often enough someone will run quotacheck on a live file system and be surprised by the results. As usual the clue is in the manual for quotacheck:
quotacheck expects each file system to be checked to have a
quota file named quotas in the root directory. If none is
present, quotacheck will not check the file system.
quotacheck accesses the character special device in calcu-
lating the actual disk usage for each user. Thus, the file
systems that are checked should be quiescent while quota-
check is running.
The first paragraph implies that the file system must be mounted (and it must). The second that it is inactive.
So when can you run quotacheck?
In single user mode. Mount the file system and then run it. If you are using UFS logging you should never need to run it if you manage your users correctly. That is to say if you create a users quota before they can create any file in the file system. If you want to retrospectively add quotas then you have to drop to single user, run quota check, then boot multi user.
Once you have quotas enabled and the system is up and running the kernel will keep track of the quotas so you don't need to check them and like the fsck case if you do check them you will just introduce a corruption.
Suddenly the ZFS model of a quota for a file system and a file system per user seems like a much better way.
Thursday July 13, 2006 I have added printing of the name of the executable and the process id that initiates an IO so that it is easier to see who is causing all those scsi commands to be sent.
Then for those who just have to have the raw bits to be happy, I have updated scsi.d to also dump out the raw cdb as well.
00000.627329400 fp5:-> 0x2a WRITE(10) address 00:00, lba 0x0001bfe9, len 0x000001, control 0x00 timeout 60 CDBP 600d7881d1c diskomizer64mpis(23849) cdb(10) 2a000001bfe900000100 00000.788444600 fp5:-> 0x2a WRITE(10) address 00:00, lba 0x00de6380, len 0x000010, control 0x00 timeout 60 CDBP 600a4282abc diskomizer64mpis(23847) cdb(10) 2a0000de638000001000
You can find the script here: http://blogs.sun.com/roller/resources/chrisg/scsi.d
Friday February 10, 2006 After some feedback about the format of the output from my Dtrace script for looking at SCSI io I how have added a timestamp which helps sorting the output. The output is now cleaner and hopefully clearer though does not fit on a 80 column screen.
00000.844267200 isp1:-> 0x2a (WRITE(10)) address 06:00, lba 0x0143a76e, len 0x000002, control 0x00 timeout 60 CDB 60031134488 SDB 60031134518 00000.844354400 isp1:-> 0x2a (WRITE(10)) address 00:00, lba 0x0143a76e, len 0x000002, control 0x00 timeout 60 CDB 3000cd59e78 SDB 3000cd59f08 00000.848251440 isp1:-> 0x2a (WRITE(10)) address 06:00, lba 0x0143ddd0, len 0x000002, control 0x00 timeout 60 CDB 6001dd1ba50 SDB 6001dd1bae0 00000.848310720 isp1:-> 0x2a (WRITE(10)) address 00:00, lba 0x0143ddd0, len 0x000002, control 0x00 timeout 60 CDB 3001da270f8 SDB 3001da27188 00000.850371280 isp1:<- 0x2a (WRITE(10)) address 00:00, lba 0x0143a76e, len 0x000002, control 0x00 timeout 60 CDB 3000cd59e78 SDB 3000cd59f08, reason 0x0 (COMPLETED) state 0x5f Time 6084us 00000.851151040 isp1:<- 0x2a (WRITE(10)) address 06:00, lba 0x0143a76e, len 0x000002, control 0x00 timeout 60 CDB 60031134488 SDB 60031134518, reason 0x0 (COMPLETED) state 0x5f Time 6927us 00000.853292800 isp1:<- 0x2a (WRITE(10)) address 00:00, lba 0x0143ddd0, len 0x000002, control 0x00 timeout 60 CDB 3001da270f8 SDB 3001da27188, reason 0x0 (COMPLETED) state 0x5f Time 5014us 00000.854442400 isp1:<- 0x2a (WRITE(10)) address 06:00, lba 0x0143ddd0, len 0x000002, control 0x00 timeout 60 CDB 6001dd1ba50 SDB 6001dd1bae0, reason 0x0 (COMPLETED) state 0x5f Time 6226us 00002.839392160 isp1:-> 0x2a (WRITE(10)) address 06:00, lba 0x0143e0b0, len 0x000004, control 0x00 timeout 60 CDB 3001da263c0 SDB 3001da26450 00002.839482480 isp1:-> 0x2a (WRITE(10)) address 00:00, lba 0x0143e0b0, len 0x000004, control 0x00 timeout 60 CDB 60002cb4538 SDB 60002cb45c8 00002.849052160 isp1:<- 0x2a (WRITE(10)) address 00:00, lba 0x0143e0b0, len 0x000004, control 0x00 timeout 60 CDB 60002cb4538 SDB 60002cb45c8, reason 0x0 (COMPLETED) state 0x5f Time 9630us 00002.850171840 isp1:<- 0x2a (WRITE(10)) address 06:00, lba 0x0143e0b0, len 0x000004, control 0x00 timeout 60 CDB 3001da263c0 SDB 3001da26450, reason 0x0 (COMPLETED) state 0x5f Time 10824us 00003.840019440 isp1:-> 0x2a (WRITE(10)) address 06:00, lba 0x0143e200, len 0x000004, control 0x00 timeout 60 CDB 3000cd59e78 SDB 3000cd59f08 00003.840110160 isp1:-> 0x2a (WRITE(10)) address 00:00, lba 0x0143e200, len 0x000004, control 0x00 timeout 60 CDB 30014c0c780 SDB 30014c0c810 00003.846265280 isp1:<- 0x2a (WRITE(10)) address 00:00, lba 0x0143e200, len 0x000004, control 0x00 timeout 60 CDB 30014c0c780 SDB 30014c0c810, reason 0x0 (COMPLETED) state 0x5f Time 6205us 00003.847439680 isp1:<- 0x2a (WRITE(10)) address 06:00, lba 0x0143e200, len 0x000004, control 0x00 timeout 60 CDB 3000cd59e78 SDB 3000cd59f08, reason 0x0 (COMPLETED) state 0x5f Time 7470us
Lots of “fun” games can be played with this, like the above shows that this system has target 0 and target 6 forming a mirror making isp1 a Single Point of failure. Although my favourite is this one:
While running
# dd if=/dev/rdsk/c0t8d0s2 of=/dev/null oseek=1024 iseek=$(( 16#1fffff )) count=2 2+0 records in 2+0 records out #
I get the following trace:
Th
00001.971470332 qus1:-> 0x00 (TEST UNIT READY) address 08:00, lba 0x00000000, len 0x000000, control 0x00 timeout 60 CDB 300016a94f0 SDB 300016a9520 00001.972324082 qus1:<- 0x00 (TEST UNIT READY) address 08:00, lba 0x00000000, len 0x000000, control 0x00 timeout 60 CDB 300016a94f0 SDB 300016a9520, reason 0x0 (COMPLETED) state 0x17 Time 937us 00001.972433832 qus1:-> 0x00 (TEST UNIT READY) address 08:00, lba 0x00000000, len 0x000000, control 0x00 timeout 60 CDB 300016a9d90 SDB 300016a9dc0 00001.973217082 qus1:<- 0x00 (TEST UNIT READY) address 08:00, lba 0x00000000, len 0x000000, control 0x00 timeout 60 CDB 300016a9d90 SDB 300016a9dc0, reason 0x0 (COMPLETED) state 0x17 Time 826us 00001.973324748 qus1:-> 0x1a (MODE SENSE(6)) address 08:00, lba 0x00000300, len 0x000024, control 0x00 timeout 60 CDB 300016a9380 SDB 300016a93b0 00001.976352165 qus1:<- 0x1a (MODE SENSE(6)) address 08:00, lba 0x00000300, len 0x000024, control 0x00 timeout 60 CDB 300016a9380 SDB 300016a93b0, reason 0x0 (COMPLETED) state 0x5f Time 3070us 00001.976443415 qus1:-> 0x1a (MODE SENSE(6)) address 08:00, lba 0x00000400, len 0x000024, control 0x00 timeout 60 CDB 300016a9ab0 SDB 300016a9ae0 00001.979359665 qus1:<- 0x1a (MODE SENSE(6)) address 08:00, lba 0x00000400, len 0x000024, control 0x00 timeout 60 CDB 300016a9ab0 SDB 300016a9ae0, reason 0x0 (COMPLETED) state 0x5f Time 2959us 00001.979453248 qus1:-> 0x08 ( READ(6)) address 08:00, lba 0x00000000, len 0x000001, control 0x00 timeout 60 CDB 300016a9c20 SDB 300016a9c50 00001.979814748 qus1:<- 0x08 ( READ(6)) address 08:00, lba 0x00000000, len 0x000001, control 0x00 timeout 60 CDB 300016a9c20 SDB 300016a9c50, reason 0x0 (COMPLETED) state 0x5f Time 403us 00001.979898415 qus1:-> 0x08 ( READ(6)) address 08:00, lba 0x00000000, len 0x000001, control 0x00 timeout 60 CDB 300016a90a0 SDB 300016a90d0 00001.980151165 qus1:<- 0x08 ( READ(6)) address 08:00, lba 0x00000000, len 0x000001, control 0x00 timeout 60 CDB 300016a90a0 SDB 300016a90d0, reason 0x0 (COMPLETED) state 0x5f Time 294us 00001.980507332 qus1:-> 0x08 ( READ(6)) address 08:00, lba 0x001fffff, len 0x000001, control 0x00 timeout 60 CDB 300016a9660 SDB 300016a9690 00001.993267665 qus1:<- 0x08 ( READ(6)) address 08:00, lba 0x001fffff, len 0x000001, control 0x00 timeout 60 CDB 300016a9660 SDB 300016a9690, reason 0x0 (COMPLETED) state 0x5f Time 12804us 00001.993382498 qus1:-> 0x28 ( READ(10)) address 08:00, lba 0x00200000, len 0x000001, control 0x00 timeout 60 CDB 300016a9940 SDB 300016a9970 00001.999256915 qus1:<- 0x28 ( READ(10)) address 08:00, lba 0x00200000, len 0x000001, control 0x00 timeout 60 CDB 300016a9940 SDB 300016a9970, reason 0x0 (COMPLETED) state 0x5f Time 5921us
I like it has you see the transition from READ(6) to READ(10) as it moves from LBA 0x1fffff to 0x200000. Did I mention needing to get out more?
You can get the script here. Still do do is correct decoding of CDBs bigger than 10 bytes, which is not a problem for my current systems and more detailed decoding of CDBs that are not reads and writes.
Tags: dtrace scsi opensolaris solaris
Friday April 01, 2005 Not when the file system is mounted!
I've been banging my head with this one of an on for a few weeks. I got an email from an engineer who was talking to a customer (who are always right) saying that when they run fsck on a live file system it would report errors:
# fsck /
** /dev/vx/rdsk/rootvol
** Currently Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
UNREF DIRECTORY I=5522736 OWNER=root MODE=40755
SIZE=512 MTIME=Mar 31 13:07 2005
CLEAR? y
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
67265 files, 1771351 used, 68625795 free (14451 frags, 8576418 blocks, 0.0% fragmentation)
***** FILE SYSTEM WAS MODIFIED *****
I kept telling them that running fsck on a live file system can and probably will generate these “errors”. The kernel's in memory copy of the file system is correct and eventually it will bring the on disk copy back in line. However by answering yes they have now corrupted the on disk copy of the file system and to make things worse the kernel does not know this so may not run fsck when the system boots. The warnings section of the fsck and fsck_ufs manual pages gives you a hint that this is a bad thing to do.
The reason they were running fsck was to check the consistency of the file system prior to adding a patch. The right way to do that would be to run pkgchk.
There are times when it is safe to run fsck on live file system, but they are rare and involve lockfs but before you do make sure you really understand what you are doing, my bet is that if you do know, you won't really want to.
I believe the message is now understood by all involved but I'm trying to make sure by adding it to the blog sphere.
Except where otherwise noted, this site is
licensed under a Creative Commons License 2.0
This is a personal weblog, I do not speak for my employer.