Angelo's Soapbox

Tuesday Apr 04, 2006

DTrace detective!

Recently I got a request from a coworker. The email Subject line said, "Solaris has a mind of its own". Basically looks like something or someone was creating a link from /etc/motd to a file in /var/spool on his cutsomers system. The customer removes the link and replace /etc/motd with a blank file several times a day and the link keeps reappearing again. So the question was can I help find the culprit? These days DTrace seems to be the answer for every question on Solaris 10. At least that is the first thing that comes to my mind. My Acer Ferrari running the latest version of  Solaris Express comes in handy as a test bed for my detective work. For starters I think symlink(2) system call is called when someone creates a link in the system. To verify that I ran the following one liner on  one window when I created a link on a another window. dtrace -n syscall::symlink:entry
dtrace: description 'syscall::symlink:entry' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  45294                    symlink:entry

OK this looks like it is working. Now lets see if we can track down the file names that are being linked. man page of symlink(2) says that the arg0 and arg1 has the names of the file being linked. So I figured that the following script should do the trick. dtrace -qn syscall::symlink:entry'{printf("linking %s to %s\n",copyinstr(arg0), copyinstr(arg1));}'
Well lets try it. Run the above script and then do a link. (ln -s xxx yyy) bash-3.00# dtrace -qn syscall::symlink:entry'{printf("linking %s to %s\n",copyinstr(arg0), copyinstr(arg1));}'
linking xxx to yyy
That looks great. Just a few point. -q option is to ask DTrace to be quite. Telling it to print only want you want and not what it thinks you want. copyinstr is needed because the file name is in user land and DTrace runs in the kernal address space. So we now have the basic building block for our script. For starters lets make the script look a little more pretty. I'm not going to send a command line to the customer. They deserve better. So here is a script.
#!/usr/sbin/dtrace -qs
syscall::symlink:entry
{
        printf("Someone is linking %s to %s\n",copyinstr(arg0),copyinstr(arg1));
}


Put it into a file and give it permission to execute (chmod a+x ). I called the script wholink.d bash-3.00# ./wholink.d
Someone is linking xxx to yyy
Ok we now have a pretty looking script. Lets make it more useful. First we need to narrow down the script to only pinpoint links to /etc/motd. Predicate shelp us achieve this.
#!/usr/sbin/dtrace -qs
syscall::symlink:entry
/copyinstr(arg1) == "/etc/motd"/
{
        printf("Someone is linking %s to %s\n",copyinstr(arg0), copyinstr(arg1));
}

Look like this works. I tried with the command ln -s /tmp/xxx /etc/motd bash-3.00# ./wholink.d
Someone is linking /tmp/xxx to /etc/motd
Just one point to note. If I change my link command to (cd /etc ; ln -s /tmp/xxx motd)my script would not catch the link so one small mod is needed. I used the basename to just look for the file name and weed out the directory name. Yes there could be some false possitives when someone links a motd file in other locations. In this case I feel getting a few extra flase positives would not be too much of an issue. Ofcourse there are ways to narrow this down to only /etc/motd. Let me know if you are interested. Or may be I'd blog about that one a little later. So for now changing the script as follows... #!/usr/sbin/dtrace -qs

syscall::symlink:entry
/basename(copyinstr(arg1)) == "/etc/motd"/
{
        printf("Someone is linking %s to %s\n",copyinstr(arg0), copyinstr(arg1));
}

Now that we have the culprit lets set a trap to find more about the process that is doing this link. Lets collect the pid, execname, and time when the link happened. Here is the script that prints these details. #!/usr/sbin/dtrace -qs
syscall::symlink:entry
/basename(copyinstr(arg1)) == "motd"/
{
        printf("Caught the culprit\n");
        printf("%20s\t %-20Y\n", "Time",walltimestamp);
        printf("%20s\t %-10d\n", "Process id",pid);
        printf("%20s\t %-20s\n", "Name of Executable" ,execname);
}
Running the script produces the following output.
bash-3.00# ./wholink.d
Caught the culprit
                Time     2006 Apr  4 13:48:46
          Process id     11007
  Name of Executable     ln

Ok this gave some info but I don't think any of this very helpful. I'd like to know who called ln. So would like to run ptree and see the process tree for that given process. Well we are in luck. DTrace allows us to run any unix command using the system method. So I modified the script. Please note DTrace is meant to run on a live production system. In order to avoid accidental modification to the system th,e ability to run system is disabled by default. You need to enable it using the -w option. So here is the modified script... #!/usr/sbin/dtrace -wqs
syscall::symlink:entry
/basename(copyinstr(arg1)) == "motd"/
{
        printf("Caught the culprit\n");
        printf("%20s\t %-20Y\n", "Time",walltimestamp);
        printf("%20s\t %-10d\n", "Process id",pid);
        printf("%20s\t %-20s\n", "Name of Executable" ,execname);
        system("ptree %d",pid);
}

When I tried runing this script for some reason ptree did not show any output. This is no good. It baffled me for a few minutes till I realized that the ln process was completeing before DTrace could get a chance to start the ptree command. Huh! you slimy culprit! Fear not brave detective. DTrace can help. Introducing the stop() method. It allows you to stop the process. We then do the ptree. I'd like to let the "ln" process to continue once we collect info. This is to avoid any other process that is doing the link from hanging. We have a law here against cruel and unusual punishment, you know.
#!/usr/sbin/dtrace -wqs
syscall::symlink:entry
/basename(copyinstr(arg1))=="motd"/
{
        printf("Caught the culprit\n");
        printf("%20s\t %-20Y\n", "Time",walltimestamp);
        printf("%20s\t %-10d\n", "Process id",pid);
        printf("%20s\t %-20s\n", "Name of Executable" ,execname);
        stop();
        system("ptree %d",pid);
        system("prun %d",pid);
}
Now the question that you have been waiting for, "Who is the culprit?" Sorry folks we take our customers info very seriously and we protect the innocent (or the guilty)!!! Just for fun I added a small variation of this script to see if we can outrun the culprit and keep /etc/motd safe. Here is the script. Try it! let me know if you can figure out how this work.
bash-3.00# cat stoplink.d
#!/usr/sbin/dtrace -wqs
syscall::symlink:entry
/copyinstr(arg1)=="/etc/motd"/
{
        printf("Caught the culprit\n");
        printf("%20s\t %-20Y\n", "Time",walltimestamp);
        printf("%20s\t %-10d\n", "Process id",pid);
        printf("%20s\t %-20s\n", "Name of Executable" ,execname);
        copyoutstr("/tmp/motd",arg1,9);
        stop();
        system("ptree %d",pid);
        system("prun %d",pid);
        system("rm /tmp/motd");
}

Comments:

Post a Comment:
Comments are closed for this entry.


  Free Tech Webinars  

Archives
Links
Referrers