Angelo's Soapbox

Tuesday Apr 04, 2006

DTrace detective!

Recently I got a request from a coworker. The email Subject line said, "Solaris has a mind of its own". Basically looks like something or someone was creating a link from /etc/motd to a file in /var/spool on his cutsomers system. The customer removes the link and replace /etc/motd with a blank file several times a day and the link keeps reappearing again. So the question was can I help find the culprit? These days DTrace seems to be the answer for every question on Solaris 10. At least that is the first thing that comes to my mind. My Acer Ferrari running the latest version of  Solaris Express comes in handy as a test bed for my detective work. For starters I think symlink(2) system call is called when someone creates a link in the system. To verify that I ran the following one liner on  one window when I created a link on a another window. dtrace -n syscall::symlink:entry
dtrace: description 'syscall::symlink:entry' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  45294                    symlink:entry

OK this looks like it is working. Now lets see if we can track down the file names that are being linked. man page of symlink(2) says that the arg0 and arg1 has the names of the file being linked. So I figured that the following script should do the trick. dtrace -qn syscall::symlink:entry'{printf("linking %s to %s\n",copyinstr(arg0), copyinstr(arg1));}'
Well lets try it. Run the above script and then do a link. (ln -s xxx yyy) bash-3.00# dtrace -qn syscall::symlink:entry'{printf("linking %s to %s\n",copyinstr(arg0), copyinstr(arg1));}'
linking xxx to yyy
That looks great. Just a few point. -q option is to ask DTrace to be quite. Telling it to print only want you want and not what it thinks you want. copyinstr is needed because the file name is in user land and DTrace runs in the kernal address space. So we now have the basic building block for our script. For starters lets make the script look a little more pretty. I'm not going to send a command line to the customer. They deserve better. So here is a script.
#!/usr/sbin/dtrace -qs
syscall::symlink:entry
{
        printf("Someone is linking %s to %s\n",copyinstr(arg0),copyinstr(arg1));
}


Put it into a file and give it permission to execute (chmod a+x ). I called the script wholink.d bash-3.00# ./wholink.d
Someone is linking xxx to yyy
Ok we now have a pretty looking script. Lets make it more useful. First we need to narrow down the script to only pinpoint links to /etc/motd. Predicate shelp us achieve this.
#!/usr/sbin/dtrace -qs
syscall::symlink:entry
/copyinstr(arg1) == "/etc/motd"/
{
        printf("Someone is linking %s to %s\n",copyinstr(arg0), copyinstr(arg1));
}

Look like this works. I tried with the command ln -s /tmp/xxx /etc/motd bash-3.00# ./wholink.d
Someone is linking /tmp/xxx to /etc/motd
Just one point to note. If I change my link command to (cd /etc ; ln -s /tmp/xxx motd)my script would not catch the link so one small mod is needed. I used the basename to just look for the file name and weed out the directory name. Yes there could be some false possitives when someone links a motd file in other locations. In this case I feel getting a few extra flase positives would not be too much of an issue. Ofcourse there are ways to narrow this down to only /etc/motd. Let me know if you are interested. Or may be I'd blog about that one a little later. So for now changing the script as follows... #!/usr/sbin/dtrace -qs

syscall::symlink:entry
/basename(copyinstr(arg1)) == "/etc/motd"/
{
        printf("Someone is linking %s to %s\n",copyinstr(arg0), copyinstr(arg1));
}

Now that we have the culprit lets set a trap to find more about the process that is doing this link. Lets collect the pid, execname, and time when the link happened. Here is the script that prints these details. #!/usr/sbin/dtrace -qs
syscall::symlink:entry
/basename(copyinstr(arg1)) == "motd"/
{
        printf("Caught the culprit\n");
        printf("%20s\t %-20Y\n", "Time",walltimestamp);
        printf("%20s\t %-10d\n", "Process id",pid);
        printf("%20s\t %-20s\n", "Name of Executable" ,execname);
}
Running the script produces the following output.
bash-3.00# ./wholink.d
Caught the culprit
                Time     2006 Apr  4 13:48:46
          Process id     11007
  Name of Executable     ln

Ok this gave some info but I don't think any of this very helpful. I'd like to know who called ln. So would like to run ptree and see the process tree for that given process. Well we are in luck. DTrace allows us to run any unix command using the system method. So I modified the script. Please note DTrace is meant to run on a live production system. In order to avoid accidental modification to the system th,e ability to run system is disabled by default. You need to enable it using the -w option. So here is the modified script... #!/usr/sbin/dtrace -wqs
syscall::symlink:entry
/basename(copyinstr(arg1)) == "motd"/
{
        printf("Caught the culprit\n");
        printf("%20s\t %-20Y\n", "Time",walltimestamp);
        printf("%20s\t %-10d\n", "Process id",pid);
        printf("%20s\t %-20s\n", "Name of Executable" ,execname);
        system("ptree %d",pid);
}

When I tried runing this script for some reason ptree did not show any output. This is no good. It baffled me for a few minutes till I realized that the ln process was completeing before DTrace could get a chance to start the ptree command. Huh! you slimy culprit! Fear not brave detective. DTrace can help. Introducing the stop() method. It allows you to stop the process. We then do the ptree. I'd like to let the "ln" process to continue once we collect info. This is to avoid any other process that is doing the link from hanging. We have a law here against cruel and unusual punishment, you know.
#!/usr/sbin/dtrace -wqs
syscall::symlink:entry
/basename(copyinstr(arg1))=="motd"/
{
        printf("Caught the culprit\n");
        printf("%20s\t %-20Y\n", "Time",walltimestamp);
        printf("%20s\t %-10d\n", "Process id",pid);
        printf("%20s\t %-20s\n", "Name of Executable" ,execname);
        stop();
        system("ptree %d",pid);
        system("prun %d",pid);
}
Now the question that you have been waiting for, "Who is the culprit?" Sorry folks we take our customers info very seriously and we protect the innocent (or the guilty)!!! Just for fun I added a small variation of this script to see if we can outrun the culprit and keep /etc/motd safe. Here is the script. Try it! let me know if you can figure out how this work.
bash-3.00# cat stoplink.d
#!/usr/sbin/dtrace -wqs
syscall::symlink:entry
/copyinstr(arg1)=="/etc/motd"/
{
        printf("Caught the culprit\n");
        printf("%20s\t %-20Y\n", "Time",walltimestamp);
        printf("%20s\t %-10d\n", "Process id",pid);
        printf("%20s\t %-20s\n", "Name of Executable" ,execname);
        copyoutstr("/tmp/motd",arg1,9);
        stop();
        system("ptree %d",pid);
        system("prun %d",pid);
        system("rm /tmp/motd");
}

Friday Nov 04, 2005

Solaris 10 DTrace Code Camps, Webcast & Hands on labs

In the last few months I've been presenting DTrace to many developers and System Admins around the world. Its been a great to talk about this very cool technology. The feedback from the developer community has been wonderful. Code Camps are an excellent format to present DTrace. 30 to a 100 developers get together. Each of them have a Solaris 10 desktops. We spend the whole day talking about DTrace and Zones . The first hour is dedicated to Solaris 10 introduction. We then have 4 hours of DTrace. Developers get to ask questions and I get to write D-scripts to show them how to answer such questions live. Then the rest of the day is spent talking about Zones and SMF. Developers sometimes bring in their applications on their Solaris 10 laptop. These lucky once get a taste of how DTrace can be used to better understand their application. Some where between all this we get good lunch at Sun's expense and snacks and ofcourse lots of coffee. If you did not get a chance to attend one of these camps, you can atleast look at the presentation.

Please provide feedback on these slides. There is always room for improvment. By the way I've been busy with other forms of presentation as well. If you are new to DTrace you may find my DTrace webcast very useful. Actualy over 2500 people have viewed it already. Again your feedback is very welcome. If you are more of the "lem'me me try for myself" type of person you may find the Hands on Labs more to your liking. Just download the zip file. Unzip it and point your browser to the index.html file. There are step by step instructions with lots of screen shots to help you walk through DTrace basics. All you need is a Solaris 10 system. If you do not have one just it. Solaris 10 run on a boat load of hardware. Chances are your systems is on the supported list .

Technorati tag:



  Free Tech Webinars  

Archives
Links
Referrers