Alan Hargreaves' Weblog
The ramblings of an Australian SaND TSC* Principal Field Technologist
* Solaris and Network Domain Technology Support Centre - The group I work forTags
(update 1) acoustic bind birthday blues bugs cec cec2007 cec2008 china cmt contention cringley debugging dogs dtrace earthquake encumbered-binaries extra flash funny google guitar halloween huron install kids linux liveupgrade locking mdb music mysql newyear niagra openjava opensolaris oracle patches patents percussion performance redhat secondlife security solaris sru sun support sxcr t2 t2000 timeslider ufs upgrade virtualbox windows youtube zfs
Wednesday Sep 29, 2004
What a week
I've been pretty quiet on the blogging front in the last week. The main reason for this is that I'm up to my eyes in a rather hot performace call involving large machines.
We had a bit of good news this morning with a major improvement by backporting a fix in the dispatcher code from Solaris 10 back to Solaris 8, but we still have a way to go. I'm not mentioning the actual fix here as we have only an interim patch1 and have a bit of process to follow before we can get it back in to Solaris 8 and 9 as a proper patch. For example we have an awful lot of cross calls that we have not yet been able to find the source of.
The standard tools like lockstat, mpstat, vmstat and a few others like statit have been invaluable, but gawd it'd be nice to be able to use dtrace on this one. Finding the cross call source would certainly be simpler, and we could also have a closer look at some of teh code paths and where time is being spent.
Anyway, it's taken a few late nights and all weekend, but I am hopeful that the light at the end of the tunnel is not an oncoming train!
1. kind of like a binary fix but with some of the nice attributes of patches like easy backout and being able to make it incompatible with having another patch put over the top of it until it has been backed out.
Posted at 09:45PM Sep 29, 2004 by Alan Hargreaves in Solaris | Comments[1]
Wednesday Sep 22, 2004
Eric's Comments on Open Solaris
Eric Schrock has written a very well considered blog entry on what oeople have been saying about Open Solaris and addresses some misconceptions. It's definitely worth a read.
Posted at 06:47PM Sep 22, 2004 by Alan Hargreaves in Solaris | Comments[1]
Friday Sep 17, 2004
Notebooks (and other uniprocessors) get power-off in Solaris 10
A putback notification I've been waiting for for quite a while now happened yesterday.
The uppc driver (uniprocessor) now has ACPI power-off functionality. It will appear in build 69.
I just run an update onto my Dell Inspiron 8500 and it works fabulously.
I can now just type
# init 5
just before getting off the train, pack the notebook up into it's back and be assured that it will actually power off without me having to hit the power button.
Thanks Andrew Gabriel for doing the work on it.
Posted at 03:44PM Sep 17, 2004 by Alan Hargreaves in Solaris Express | Comments[3]
Wednesday Sep 15, 2004
ZFS feature article
The feature article on www.sun.com this week is ZFS
The article features some great background as well as descriptions of what exactly we are doing with ZFS and more than a few quotes from Jeff Bonwick.
A recommended read.
Posted at 12:03PM Sep 15, 2004 by Alan Hargreaves in Solaris Express | Comments[1]
Tuesday Sep 14, 2004
reading and writing long dtrace command lines
If you read through my last dtrace adventure, you may have noticed more than a few multi-line dtrace commands.
Like most engineers, I will look for the easiest way to accomplish what I need to get done. Actually writing D-scripts into a file is only something that I'll do if things start getting a bit complex.
When I'm playing dtrace, I tend to use "ksh -o vi" as my shell.
As my scripts commands start heading toward that right margin, I use the edit option within ksh and actually lay out the command across multiple lines so I can read what I'm doing. For those that haven't tried this, it's just a matter of hitting <ESC>v while typing.
It's also a good way of evolving the commands as you go, which is the basis of a lot of dtrace-instigated investigative work.
It certainly helps me make for dtrace commands that are a little more complex than you would normally type on the command line, but not quite comlex enough to make the decision to start writing scripts.
Posted at 09:48AM Sep 14, 2004 by Alan Hargreaves in Solaris Express | Comments[2]
Friday Sep 10, 2004
dtracing on the train - gnome-vfs-daemon
There was some discussion internally recently about using gnome-vfs-daemon as a process to start demonstrating dtrace with. As a result I thought it would be interesting to have a look at it on my notebook while coming to work on the train (about a 90 minute trip).
OK, first off, let's have a look at the system calls. We could do this with
$ /usr/sbin/dtrace -n 'syscall:::entry/execname=="gnome-vfs-daemon"'
But that really generates a lot of output. One of the things that I do notice though is the following sequence repeating itself every three seconds.
0 397 open64:entry 0 397 open64:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 103 ioctl:entry 0 309 llseek:entry 0 17 close:entry 0 317 pollsys:entry
This peaked my interest. How about an aggregation over a minute?
$ /usr/sbin/dtrace -q -n '
syscall:::entry/execname=="gnome-vfs-daemon"/{
@p[probefunc] = count();
}
tick-1m {
printf("\nCount Syscall\n----- -------\n");
printa(" %@3d %s\n",@p); exit(0);
}'
Count Syscall
----- -------
20 pollsys
20 close
20 llseek
40 open64
360 ioctl
So, for each pollsys() and close(), it looks like I am seeing two open64()s and 18 ioctl()s, and from earlier we know that we see two open64()s followed by the ioctl()s. Let's have a closer look at these system calls. Also, while we are at it, let's try and find out what the ioctl is. We now move to scripting files.
#!/usr/sbin/dtrace -s
syscall::ioctl:entry
/execname == "gnome-vfs-daemon"/
{
self->interest = timestamp;
self->fd = arg0;
self->cmd1 = (char)(arg1>>8);
self->cmd2 = arg1&0xff
}
syscall::ioctl:return
/self->interest/
{
printf("ioctl(%d, '%c'<<8|%d, buf) took %u us",
self->fd,
self->cmd1,
self->cmd2,
(timestamp - self->interest)/1000
);
ustack();
self->interest = 0;
}
syscall::open64:entry
/execname == "gnome-vfs-daemon"/
{
self->file = copyinstr(arg0);
self->interest = 1;
}
syscall::open64:return
/self->interest/
{
printf("%s returns fd %d\n", self->file, (int)arg1);
ustack();
self->interest = 0;
}
An excerpt of the output shows
CPU ID FUNCTION:NAME
0 398 open64:return /etc/fstab returns fd -1
libc.so.1`__open64+0x15
libc.so.1`_endopen+0xa8
libc.so.1`fopen64+0x26
libgnomevfs-2.so.0.600.0`_gnome_vfs_get_unix_mount_table+0x3e
libglib-2.0.so.0.400.1`g_source_callback_funcs
libgnomevfs-2.so.0.600.0`poll_fstab
0 398 open64:return /etc/mnttab returns fd 44
libc.so.1`__open64+0x15
libc.so.1`_endopen+0xa8
libc.so.1`fopen64+0x26
libgnomevfs-2.so.0.600.0`_gnome_vfs_get_current_unix_mounts+0x37
libglib-2.0.so.0.400.1`g_source_callback_funcs
libgnomevfs-2.so.0.600.0`poll_mtab
0 104 ioctl:return ioctl(44, 'm'<<8|7, buf) took 232 us
libc.so.1`ioctl+0x15
libgnomevfs-2.so.0.600.0`_gnome_vfs_get_current_unix_mounts+0x55
0 104 ioctl:return ioctl(44, 'm'<<8|7, buf) took 4 us
libc.so.1`ioctl+0x15
libgnomevfs-2.so.0.600.0`_gnome_vfs_get_current_unix_mounts+0xfe
OK, so we first attempt to open /etc/fstab, which obviously does not exist in solaris, then we open /etc/mnttab, which does. We then go and do lots of ioctl()s on it.
The 'm'<<8|7 (as well as being MTIOCGUARANTEEDORDER in sys/mtio.h) is apparantly MNTIOC_GETMNTENT (common/sys/mntio.h).
This has only gone back into Solaris 10 in the last few months. It's generally called by getmntent(3C).
So basically, it looks like gnome-vfs-daemon is reading the mount table of the system every three seconds. This certainly seems like a lot of overkill. Indeed on the notebook I ran it on, if we try the following script
#!/usr/sbin/dtrace -s
syscall::ioctl:entry
/execname == "gnome-vfs-daemon"/
{
self->interest = timestamp;
}
syscall::ioctl:return
/self->interest/
{
@ioctl[probefunc] = quantize((timestamp - self->interest)/1000);
self->timestamp = 0;
}
syscall::open64:entry
{
self->interest = timestamp;
self->file = copyinstr(arg0);
}
syscall::open64:return {
@open64[self->file] = quantize((timestamp - self->interest)/1000);
self->timestamp = 0;
}
tick-1min {
printa (@ioctl);
printa (@open64);
exit(0)
}
We can look at just how much time is being spent (we could always actually sum the numbers, but I like looking at graphs).
ioctl
value ------------- Distribution ------------- count
1 | 0
2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 340
4 | 0
8 | 0
16 | 0
32 | 0
64 | 0
128 | 0
256 |@@ 20
512 | 0
/etc/fstab
value ------------- Distribution ------------- count
8 | 0
16 |@@@@@@@@@@@@@@@@@@@@ 10
32 |@@@@@@@@@@@@@@@@@@@@ 10
64 | 0
/etc/mnttab
value ------------- Distribution ------------- count
32 | 0
64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 20
128 | 0
So, worst case on this system is 340*2 + 10*16 + 10*32 + 20*64 = 2440 microseconds every minute, or 122 microseconds every iteration. This might not initially sound like an awful lot to worry about. However, in this iteration, I am only seeing 81 mount points on my system and I only have one gnome-vfs-daemon running. The truly scary concept is when we start looking at this issue in the light of large sunray servers, say with 250-300 entries on the mount table and around 80 users (instances of gnome-vfs-daemon).
Fortunately there is an answer. If we lookat the man page on mnttab(4) we see
NOTES
The snapshot of the mnttab information is taken any time a
read(2) is performed at offset 0 (the beginning) of the
mnttab file. The file modification time returned by stat(2)
for the mnttab file is the time of the last change to
mounted file system information. A poll(2) system call
requesting a POLLRDBAND event can be used to block and wait
for the system's mounted file system information to be dif-
ferent from the most recent snapshot since the mnttab file
was opened.
So we actually only need to stat the file to see if the mount table has changed. This is a much more efficient way of doing things.
After logging a bug on this, the engineer who took it determined that the code to do the stat() was actually in there, but due to some faulty logic, it would never get executed. This is now logged into the gnome bugzilla as 152114 and 152115 and with luck should be fixed shortly.
Posted at 06:15PM Sep 10, 2004 by Alan Hargreaves in Solaris Express |
Saturday Sep 04, 2004
Lucy Logic
Well, looks like I get to work the weekend that is Fathers day here in Oz. This morning (Saturday), Lucy wanders into the bedroom all excited because she worked out the following sequence.
- Fathers Day is not a School Day
- Today is not a School Day
- Therefore it must be Fathers Day!
How can you argue that kind of logic from a five year old!
Anyway, she gave me a small statue of a wizard and this cute little Genie Wish envelope from which I could draw two wishes. The envelope contained three slips of paper. Tonight she will read me a story and for the next two nights she will go to bed when she is told. :)
Jake got me this wonderful "Welcome to the Internet" coffee mug with a tile coaster. The coaster has a picture of a mouse on it. "Don't try to us eit daddy, it's not a real one".
I love my kids.
Posted at 07:23PM Sep 04, 2004 by Alan Hargreaves in Kids | Comments[3]

