Alan Hargreaves' Weblog

The ramblings of an Australian SaND TSC* Principal Field Technologist

* Solaris and Network Domain Technology Support Centre - The group I work for

Tags

(update 1) acoustic bind birthday blues bugs cec cec2007 cec2008 china cmt contention cringley debugging dogs dtrace earthquake encumbered-binaries extra flash funny google guitar halloween huron install kids linux liveupgrade locking mdb music mysql newyear niagra openjava opensolaris oracle patches patents percussion performance redhat secondlife security solaris sru sun support sxcr t2 t2000 timeslider ufs upgrade virtualbox windows youtube zfs
pageicon Tuesday Aug 31, 2004

A plea to help support help you!

This week, I am the duty engineer for Kernel calls within the PTS Organisation during Asia-Pacific coverage hours.

Yesterday I received a callout on an escalation that had gone around the world a number of times and been in existence since early July. The guts of the issue (as we had been told) was that the customer was setting the quotas of new usrs and the amount of disc used was being initialised to a rather large non-zero value.

There had been an awful lot of work done to try to determine where the values were initialised both in edquota and in the kernel ioctl() - which is why the Kernel group was involved.

Anyway, the news that came to light yesterday was that the customer was not using edquota. They were using some code that they had written themselves. The actual code itself looked fine. What was wrong was an assumption that they had made. That is that variables allocated on the stack are zero-filled.

They are not.

They contain the data that happened to be on the stack at the time the function was called. So, the bad value that was being written into the quota ioctl() was the result of non-initialised data.

The point of this plea is simple.

Once we had all of the relevant data, this call was solved within hours. If this information had been passed to us when the call had been opened, it would have been closed just as quickly - at the beginning of July, rather than the end of August.

It is crititcal when you log support calls with us to give us the full picture; in this call we had been working under the assumption that the problem was being exhibited using edquota, and a lot of man-weeks were effectively wasted and the customer did not see a solution until almost two months after the call had been placed.

We are currently pushing a process called Sun Global Resolution towards our customer facing folk. This process is based around the Kepner-Traego Analytical Troubleshooting process. Part of the beginning of this process is to really define the expected behaviour and deviation, along with all of the concerns and background information. If you start being asked a whole lot of questions that you don't immediately see the relevance to your problem, hang in there. This information really does help us get to the bottom of problems faster.

Wow, Bryan and Adam have been busy

Just noticed two rather large dtrace putbacks into build 67 today.

Bryan has just completed a whole lot of small fixes that include the gas/as problem with the fbt provider that was discussed on the bigadmin forum and some interaction issues with kmdb.

Adam has just put back the plockstat provider that he has been talking about.

Go guys!

Jake's soccer

Finally got some photos scanned. I'll just put two up here.

Jake playing soccer Individual photo

pageicon Thursday Aug 26, 2004

Kids update

Jake played his last game of soccor for the year last week and we got it on video. They ended up in a 2-2 draw against a team that beat them pretty soudnly earlier in the year. Jake gave us a bit of a scare near the end when we thought he had mistakenly kicked the ball the wrong way while an opponent was running up on it. As it turns out (and the video bears this out) he meant to put the ball over the backline as to do anything else would have risked losing it and possibly the game. He was awarded player of the match for "being tricky" and thinking on his feet.

Jake also got a literacy award from Kindergarten yesterday. For improvement in his writing and for writing a sentence in a straight line. Well done Jake!

Lucy has been off sick for a few days (as has her mum) with a pretty awful cough that they are both on antibiotics for. They are sounding a lot better today.

Dtrace should be used just like any other tool

I believe that we will have succeeded with dtrace when all kinds of folks, from sysadmins to developers start usign dtrace as a normal part of their troubleshooting toolkit.

As a simple example, today I noticed my workstation running with a load average of just above 1. Not unusual, except I really wan't having the machine do anything at the time. Unfortunately, I haven't the output from the commands as I addressed the problem this morning and it's way out of my scroll buffer now. I'm actually pulling the commands back out of my shell history.

Maybe we've got something fork/execing a lot (that won't show in prstat). OK, let's pull out dtrace and have a look at what's going on ...

# dtrace -n 'proc::exec_common:exec-success {trace(arg0);}'

No, that was pretty quiet. OK, let's look at what is getting scheduled.

# dtrace -n 'sched::resume:on-cpu {printf("%s: %d\n", execname, pid);}'

OK, this is scrolling past relatively quickly, so it appears that we've got a bit of contect switching going on and pretty much always got something ready to run. This probe will print a line for every time we put something onto proc. How about using an aggregation to tidy things up a bit?

# dtrace -n 'sched::resume:on-cpu {@[execname,pid] = count();}'

Hmmmm, much nicer, I can see that sched and one particular instance of mozilla (I have to run a few in different profiles) is finding itself put onto proc much more often than anything else. The mozilla in question is running with pid 20051. Running pargs 20051 tells me a bit more about which instance it is. It's the one that I use generally and do my email with. At this point I could probably have killed off and restarted the mozilla and the problem would have gone away, but I was interested in what it was doing.

OK, so what are the system calls?

# dtrace -n 'syscall:::entry /pid == 20051/ { @[probefunc] = count();}

This showed me that we were spread pretty equally around an ioctl on fd3, a read on fd5 and a write on fd6. Running pfiles 20051 tells us that these file descriptors are pipes, and very little else.

If we were interested we could also aggregate on the user level stacks to each of the calls, like this.

# dtrace -n 'syscall::ioctl:entry /pid == 20051/ { @[ustack()] = count();}'
# dtrace -n 'syscall::read:entry /pid == 20051/ { @[ustack()] = count();}'
# dtrace -n 'syscall::write:entry /pid == 20051/ { @[ustack()] = count();}'

One interesting there here was that for each of the above probes, there was only one user stack. I suppose that shouldn't be surprising given that they are always calling with the same file descriptor.

It was also interesting that the other instance I had running at the time was not doing anything even vaguely similar. There are two main differences between them.

  • My general instance also does my email
  • My non-general instance only talks to our escalation tracking system, so it's only really one website

At this point I killed and restarted the mozilla, ran some of the commands before and after starting my mail. The load on the box was much lighter, but mail didn't make a difference. I can only conclude that a site I must have visited left something playing up in mozilla. Oh well, won't be the last time.

The point of me blogging this is to show just how easy it is to start getting to the bottom of problems that would be otherwise very difficult to observe. That's the whole idea of dtrace. It's a tool to let you observe the system and it can be used just as easily as you would use the others that I interspersed here (eg pfile, prstat and pargs).

As I have been saying to Jacob before every soccer match, "Get in and have a go". It's the only way to really start to appreciate the usefulness.

pageicon Monday Aug 23, 2004

Kprobes in Linux vs Dtrace

An article on osnews points at an article at IBM about Kprobes on Linux

While this is probably a step in the right direction I still have some concerns. I would encourage the author to look at adding in some more protection. i.e. Always practice safe probing.

  1. I don't see any checking for NULL Pointer dereferences for the printk's. If this is the case, then a poorly written kprobe can still take out a production box. In fact any bad piece of code could take it out.
  2. It stills rather clunky to get simple probes inserted. Looking through the article shows a lot of work required to get the probes in. The equivalent probe in dtrace would be
    #!/usr/sbin/dtrace -s
    
    syscall::fork1:entry, syscall::forkall:entry, syscall::vfork:entry {
        printf("\n\tpid=%d kthread=0x%llx\n", pid, (long long)curthread);
        printf("\tt_state=0x%x cpu=%d\n",
            curthread->t_state, curthread->t_cpu->cpu_id);
        printf("\n\tCaller program is \"%s\"\n\n", execname);
        printf("\tUser Space stack\n");
        ustack();
        printf("\n\tKernel Space Stack\n");
        stack(10);
    }
    
    
    Which gives us the following results
    # ./fork.d
    dtrace: script './fork.d' matched 3 probes
    CPU     ID                    FUNCTION:NAME
      2    207                      vfork:entry 
            pid=1443 kthread=0x300056a3c60
            t_state=0x4 cpu=2
    
            Caller program is "csh"
    
            User Space stack
    
                  libc.so.1`vfork+0x20
                  csh`execute+0xcbc
                  csh`process+0x360
                  csh`main+0xe94
                  csh`_start+0x108
    
            Kernel Space Stack
    
                  unix`syscall_trap32+0xcc
    
    
    Alternately, with the knowledge that in Solaris each of these three system calls call cfork() (which you could also determine with dtrace), we could simply do
    #!/usr/sbin/dtrace -s
    
    fbt::cfork:entry {
        printf("\n\tpid=%d kthread=0x%llx\n", pid, (long long)curthread);
        printf("\tt_state=0x%x cpu=%d\n",
            curthread->t_state, curthread->t_cpu->cpu_id);
        printf("\n\tCaller program is \"%s\"\n\n", execname);
        printf("\tUser Space stack\n");
        ustack();
        printf("\n\tKernel Space Stack\n");
        stack(10);
    }
    
    Which would give us exactly the same output as the calls to cfork() are done with tail recursion. On an x86 box it would look something like:
    # ./fork.d
    dtrace: script './fork.d' matched 1 probe
    CPU     ID                    FUNCTION:NAME
      0   3882                      cfork:entry 
            pid=669 kthread=0xffffffffd5ae6000
            t_state=0x4 cpu=0
    
            Caller program is "csh"
    
            User Space stack
    
                  libc.so.1`vfork+0x45
                  csh`execute+0x12f
                  csh`process+0x24b
                  csh`main+0xa25
                  80580ea
    
            Kernel Space Stack
    
                  unix`sys_call+0xda
    
    

Now there are also a couple of other nice things to consider here.

  1. No need to register and unregister the probe. If I'm not running the dtrace script, then the probe does not exist.
  2. If I want to change the query, I just edit the script.

  3. This one is actually a pretty basic probe. I can get much more complex information with very little effort, and as I have already stated, it's just a matter of modifying the script and the probe does not exist unless I am running the script.

But the most important thing to remember is that we have protection against the probe taking out the system. That means that we have no hesitation in running dtrace probes on production boxes, where outage time is measured in thousands of dollars per second (yes we have such customers).


Update

Dan Price made a suggestion which tidies the script up even more, meaning that even if we change the way that we do fork(), the script will remain working. This gives us stability with kernel releases as well. To see the new script, look at the comments for this entry.

pageicon Friday Aug 20, 2004

Kimberley Brown starts blogging

Kimberley Brown, one of the authors of Panic! has just started blogging.

While this book was written around the time of Solaris 2.4, it still holds much relevance today. Indeed all of the PTS Kernel engineers in this office have a copy of the book. This is also one of the two books that I have listed in the On#Sun article I wrote about crashdump analysis.

There is a survey running at the Panic! site for if you want to see a second edition of the book, that would be a companion to the original, rather than a rewrite that would talk about some of the more current stuff. I encourage you to visit it and vote.

I've added Kimberley's blog to my list of bloggers on the right hand side of this page.

On a side note, I'd also love to see Solaris Internals revamped for the release of Solaris 10, as there is certainly a lot more in there; and with us open sourcing Solaris, it would be good to see the same kind of depth of explanation of how things work for folks to use as a reference. Are you listening Richard and Jim? ;-)

pageicon Tuesday Aug 17, 2004

Unlink slow on some systems

A trackback comment from Mark in Soft Updates in Solaris is amazingly timely. Very recently I had a call about the converse problem. That is, a customer was seeing rm and the like run very slowly on some systems and very quickly on others.

I have explained what is going on in SRDB 77645 - "UFS[TM]: Deleting Files and Directories is Slow", which was published less than a week ago.

What it comes down to is that when ufs logging is turned on, the process of cleaning out the inode is placed into the transaction queue to be handled by a dedicated kernel thread and the unlink() system call immediately returns. See the SRDB for more detail.

pageicon Sunday Aug 15, 2004

Claire Giordano starts blogging

Claire is part of the driving force behind open sourcing Solaris. I look forward to seeing what she has to say in this public forum. I can only agree with the "going through uni" sentiments.

I started at Newcastle doing a Computer Engineering degree. After a year of that I decided that this wasn't really what I wanted and I changed to a B.Math, where at the time, the only CompSci related stuff was going on. During my time doing that I started working for them adminning Edition 7 unix boxes. Anyway, as soon as they formed the B.CompSc degree, I was in there and that's what I ended up with.

A big part of my education was source code access. We had the full V7 source, and did a lot of hacking to it. Gone are the days of a 64k kernel ;). While working there, I was one of the five folks listed on the SunOS 4 source licence and that also allowed a lot of learning and customising of software for the uni. For example, we hacked a page charging system into lpd and worked up a method of distributing it to other Sun boxes on campus in such a way as to not breach the terms of the license.

Yes, I think using tonic to get back to our roots of having our source used for teaching would be a damn good thing.

Anyway I've added Claire to my list of bookmarks on this blog.

New stuff in Solaris Express

Wow, there are even a few things in there that I had not been aware of, like

  • VTS for x86,
  • DHCP for Logical interfaces

Two of the items that jump out at me as being very useful are:


KMDB

This is one that I had been waiting for and when I saw the putback happen, there was an immediate congrats sent to Matt Simmons. The great thing was that Matt was working in New Zealand at this point in time so I saw it during my working day!

Kmdb brings much of the functionality of mdb to the kernel debugger (replacing kadb).

One of the really great things about it is that you don't need to have booted with it in order to use it. 'mdb -K' option will load the module and drop you directly into the debugger on the console. I have lost count of the number of times that I have wished that a customer had booted with kadb so that when a rare problem crops up, we can actually do something useful. Well, now we'll be able to do something about it. Again, great work Matt.


IPsec and NAT-Traversal

I was waiting for this one it now means that we can do a completely Sun on Sun vpn connection on my notebook back into work, rather than relying on third parties who have decided that Solaris on x86 is not something they want to do for their vpn support.

Solaris Express 8/04

Solaris Express 08/04 which is based on build 63 should be available for download by August 17 (note this is August 17 in the US), if the announcements that I have seen are anything to go by.

I'll talk about some of these in more detail in later blogs, but the new features for the release include:

  • DHCP Event Scripting
  • DHCP for Logical Interfaces
  • x86: Sun Validation Test Suite 6.0
  • Kernel Modular Debugger
  • Java 2 Platform, Standard Edition 5
  • Zones-Wide Resource Control
  • Stream Control Transmission Protocol
  • Zebra Multi-Protocol Routing Suite
  • IPsec and NAT-Traversal
  • Enhancement to the nfsmapid Daemon
  • sendmail Version 8.13
  • Per-thread Mode Enhancement
  • perzone Audit Policy
  • OpenSSL and OpenSSL PKCS#11 Engine
  • x86: Fibre Channel Connectivity for Storage Devices
  • BIND 9
  • Samba
  • Flex 2.5.4a
  • SIP Proxy Server
  • libusb 0.1.8

Remember that you will need the three software cd images to install Solaris Express.

As usual, all docs are on docs.sun.com without password protection.

pageicon Tuesday Aug 10, 2004

Getting the mute and volume keys working in x86

Around April I logged a bug to try and get this stuff working. I had managed a very kludgy binary patching process to get it working with Xsun, but had also decided to switch to Xorg (speed reasons mainly), so I was out in the cold again.

I'm running a modified version of the kb8042 driver which returns the right keycode/scancode mapping for them. The next trick was to see how how the Xorg xserver reacted to this. Not good. We get a whole lot of complaints in /var/log/Xorg.log.0 about KEY_UNUSED whenever I press or release one.

Hmmmm, ...

Time to hit the Xorg source code. It looks like there is a map into which I have to add in the keysymbol numbers. OK, I have patched those into the binary with mdb. The errors have stopped, but now whenever I try to map those keys to their functions in the gnome keyboard shortcuts, pressing the key undefines the function (just like if I had pressed backspace).

Gawd it's frustrating to be soooooo close. Maybe something will occur to me overnight.

pageicon Sunday Aug 08, 2004

leading underscore in /etc/system variables

While working an escalation the other week I needed to increase the default size of a kernel stack for a customer in order to provide relief1 while a new bug was fixed. The default size of a kernel stack is governed by _defaultstksz. Unfortunately the way that the lexical analyzer for /etc/system is written, we can't have a leading underscore in a variable name in /etc/system. Normally we can work around this with the following addition to /etc/system:

    set lwp_default_stksize=0x4000

Unfortunately, the problem that we were seeing was using a thread that was created specifically using _defaultstksz, so this workaround didn't help.

Well, the only way that we could provide relief was to actually modify the variable in the kernel binary (in this case /platform/i86pc/kernel/unix as it was a v20z) with a bit of mdb.

    # echo _defaultstksz/W0x4000 | mdb -w /platform/i86pc/kernel/unix

As you can guess, this is not something that we feel particularly comfortable about.

Apart from the fact that we are really patching a kernel binary, the next time that the customer installs a patch, the workaround is silently gone (we're actually working on something for this, more as I can talk about it).

Well, it appears that I am not the first to encounter this problem. Bug 4025675 complains about precisely this issue, and apparantly it was addressed in build 50 of Solaris 10. The actual fix itself was pretty simple, so I endeavoured to backport the fix to Solaris 8 and 9.

The actual backport and builds for the various systems did not take too long. What did take some time was the verification on all combinations of (Solaris 8, Solaris 9) vs (SPARC32, SPARC64, x86).

I finished my final test yesterday and submitted my request to integrate to get this into the kernel updates for the aforementioned systems. The request has been granted and now I'm just waiting on the gates to open for the patch builds.

With luck this will get integrated shortly, and after the kernel patches go through their thorough workout before release, we shoudl be seeing it for Solaris 8 and 9.


1. The way that we work in PTS is that the focus is on getting the customer running with some kind of workaround until the process of producing a patch (which includes an awful lot of testing) can be completed.

pageicon Monday Aug 02, 2004

Dtrace, Solaris, Open Sourcing ...

Adam Leventhal has just got back from OSCon and has made some interesting blog entries in the last few days.

Linux, Solaris and Open Source is definitely worth a read. He discusses a conversation he had with Greg Kroah-Hartman about Linux kernel development; goes on to talk a little about Open Solaris and the Solaris Community and some of the feedback he is getting (good looking feedback too!).

Adam is now officially a celebrity :-) In DTrace Spotting it appears that he was recognised in an airport as one of the DTrace developers and a conversation struck up.

Other folks to talk about OSCon include Eric Schrock here and Jim Grisanzio who also includes some photos. Isn't it nice to finally be able to associate faces with some of the names that we have been seeing around?

Jake's Soccer and my weekend

I took Jake to soccer again this weekend. It was a ground I had never been to and I ended up going down the wrong road just before we got there, but we did make it. They took on Doyalson/Wyee.

There was a little confusion from the sidelines as both sides had a Jacob and a Jessie. The highlights... Sam did a magnificent header, well planned and directed during the first half. I think they scored one in the first half and one in the second. The second goal was a great team effort involving some good passing near the goals. Jake got to make a few throw ins. One in particular was very well done as he threw it down the line to exactly where his player was running to. They actually scored another goal, but the refereee must have been unsighted. It looked like the ball bounced off the back of the net back to the goalkeeper, and the ref only saw the ball in the keepers hands. Oh well, they still won 2-0

I took both Jake and Lucy to Sam's 6th birthday party at McDonalds later that afternoon and they had a great time there.

That evening the for of us joined our across-the-road neighbors for a really nice BBQ and drinks (well the adults had the drinks).

I've also got to add, that the Newcastle Knights Rugby League club finally got out of their eight week slump and put in a great win against the South Sydney Rabbitohs. That made Sunday night TV much more enjoyable. They are only one point out of the top eight now, so they could still be in the running for the finals series.