Ghost Busting
Hunting down the Ghosts in our machines. Chris Beal's Weblog

20090610 Wednesday June 10, 2009

Comparing dtrace output using meld

Comparing dtrace and other debug logs using meld

meld is a powerful OpenSource graphical "diff" viewer. It is available from the OpenSolaris IPS repositories so can bee installed from the packagemanager in OpenSolaris or simply by typing

   $ pfexec pkg install SUNWmeld

It is very clever at identifying the real changes within files and highlighting where the difference start and end.


This example is using the output of two dtrace runs, tracing receive network activity in the a development version of the nge driver, once when it works, and once when it doesn't, trying to identify a bug in a development version.

First off the dtrace script is

   $ cat rx.d 
   #!/usr/sbin/dtrace -Fs
   fbt::nge_receive:entry
   {
       self->trace=1;
   }
   
   fbt::nge_receive:return
   /self->trace==1/
   {
       self->trace=0;
   }
   
   fbt:::entry
   /self->trace==1/
   {
       printf("%x",arg0);
   }
   
   fbt:::return
   /self->trace == 1/
   {
       printf("%x",arg0);
   }

This very simply traces all function calls from the nge_receive() function.


So I ran it twice, once when the network was working, and once when it wasn't and simply ran meld over the two files.

   $ meld rx.out rx.works.out

This throws up a gui as seen here





It's worth loading full size. What you see is on the right a large area of probes that have fired that do not exist within the one on the left. That implies a lot of code run in the working case that is missing from the failing case.

This is a picture of source code of nge_receive()



You can see it essentially does two things

   o Calls nge_recv_ring()
   o If that succeeds calls mac_rx()

Looking at the meld screenshot you can see the big green area starts at mac_rx. So in the failing case nge_receive() doesn't call mac_rx() (that'd explain why it fails to receive a packet).

Why doesn't it? Well it implies that nge_recv_ring() returns NULL. nge_recv_ring() is supposed to return an mblk, and it hasn't. Why is that? well looking in to the blue and red highlighted area in the meld window, we see another area in the working case that is missing in the failing case. Hey presto, this bit is the call to allocb(). allocb() is used to allocate an mblk.

So we know in the failing case the nge_recv_ring() function fails to allocate an mblk. Now just need to work out why.

I found this a powerful way of viewing complex data and quickly homing in on differences.

Posted by cwb ( Jun 10 2009, 05:07:12 PM BST ) Permalink

20090601 Monday June 01, 2009

It's Finally Here

I've decided I really need to get back to writing a blog occasionally, and what better day to choose than June 1 2009. Why? Well today we release OpenSolaris 2009.06, the latest OpenSource release of our operating system Solaris.

I know this all sounds a bit marketing, but actually there are some really good reasons for running OpenSolaris on your own machine.

First off, it is the most secure OS I know of. No need to Virus protection.

Second, it just works (mostly). I've just got a new Macbook Pro, I always find it easier to do development work on Solaris than any other platform so I like to run OpenSolaris. It installed pretty much seamlessly (just having to change the EFI disk label using the macOS fdisk utility as described here). The only thing that doesn't work out of the box is the Wifi - which is a pain. It's a broadcom chipset so I've got hold of a PCI3/4 Atheros card which works well

Third, all the development tools I need (and indeed anyone developing for or on Solaris) are available within the standard repositories. I found this page which is how I set up my laptop as a build machine.

From a day to day computing perspective it does everything I need. Mail, Web, chat all included, OpenOffice in the repositories for free (and simple) download. A new Media player in Elisa (in the repo), though unfortunately you have to buy the codecs for many common video formats.

So the next questions is, is it any different from 2008.11? Well it's hard for me to say as I've been upgrading every few weeks to the latest development builds (by using the opensolaris.com/dev repository). But I did install it fresh in side a VirtualBox VM and was impressed with the speed of the install. The auto installer is now more complete and can install SPARC machines (necessary for a good proportion of our customers). There are networking improvements, but generally the speed and usability is what you'll notice.

Oh and Fast reboot. Makes it much quicker to shutdown or reboot a machine.

Today I'm attending Comunity One (or C1 as we call it) and much more will be discussed about OpenSolaris and all our other OpenSource development efforts. I'll try to remember to write a blog about it (though don't hold your breath on recent form :-)




Posted by cwb ( Jun 01 2009, 03:39:54 PM BST ) Permalink


Archives
Innovation
OpenSolaris: Innovation Matters
Links
Referrers