Reflections on OS integration Eric Schrock's Weblog

Friday Jun 18, 2004

One of the most visible features that I have integrated into Solaris 10 is the ability to store pathnames with each open file1. This allows new avenues of observability that were previously inaccessible. First off, we simply have the files as symbolic links in /proc/<pid>/path:

        $ ls -l /proc/`pgrep Firebird`/path | cut -b 55-
        0 -> /devices/pseudo/mm@0:null
        1 -> /home/eschrock/.dt/sessionlogs/machine_DISPLAY=:0
        10 -> /usr/local/MozillaFirebird/chrome/comm.jar
        11 -> /usr/local/MozillaFirebird/chrome/en-US.jar
        12 -> /usr/local/MozillaFirebird/chrome/embed-sample.jar
        13 -> /usr/local/MozillaFirebird/chrome/pipnss.jar
        14 -> /usr/local/MozillaFirebird/chrome/pippki.jar
        15 -> /usr/local/MozillaFirebird/chrome/US.jar
        16 -> /usr/local/MozillaFirebird/chrome/en-unix.jar
        17 -> /usr/local/MozillaFirebird/chrome/classic.jar
        18 -> /usr/local/MozillaFirebird/chrome/toolkit.jar
        19 -> /usr/local/MozillaFirebird/chrome/browser.jar
        2 -> /home/eschrock/.dt/sessionlogs/machine_DISPLAY=:0
        20
        21
        22 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/Cache/_CACHE_MAP_
        23 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/Cache/_CACHE_001_
        24 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/Cache/_CACHE_002_
        25 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/Cache/_CACHE_003_
        26
        27 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/formhistory.dat
        28 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/history.dat
        29 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/cert8.db
        3
        30 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/key3.db
        4 -> /var/run/name_service_door
        5 -> /home/eschrock/.phoenix/default/7pkwqbju.slt/XUL.mfasl
        6
        7
        8
        9
        a.out -> /usr/local/MozillaFirebird/MozillaFirebird-bin
        cwd -> /home/eschrock
        root -> /
        ufs.102.0.11082 -> /usr/lib/iconv/646%UTF-16LE.so
        ufs.102.0.11521 -> /usr/lib/iconv/UTF-16LE%646.so
        [ ... output elided ... ]
        $

As usual, mozilla firebird has lots of interesting stuff open. You may notice that some of the file descriptors have no path information. This is likely because they refer to a socket or FIFO (there is a small chance they refer to a file that has since been moved). The pfiles(1) command has been modified to use this information, so you can now see the path with the rest of the goodies:

        $ pfiles `pgrep Firebird`
        286670: /usr/local/MozillaFirebird/MozillaFirebird-bin
          Current rlimit: 512 file descriptors
           0: S_IFCHR mode:0666 dev:200,0 ino:6815752 uid:0 gid:3 rdev:13,2
              O_RDONLY|O_LARGEFILE
              /devices/pseudo/mm@0:null
           1: S_IFREG mode:0644 dev:210,1281 ino:346 uid:138660 gid:10 size:4164
              O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
              /home/eschrock/.dt/sessionlogs/machine_DISPLAY=:0
           2: S_IFREG mode:0644 dev:210,1281 ino:346 uid:138660 gid:10 size:4164
              O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
              /home/eschrock/.dt/sessionlogs/machine_DISPLAY=:0
           3: S_IFIFO mode:0666 dev:209,0 ino:9 uid:0 gid:1 size:0
              O_RDWR|O_NONBLOCK FD_CLOEXEC
           4: S_IFDOOR mode:0444 dev:209,0 ino:52 uid:0 gid:0 size:0
              O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[100253]
              /var/run/name_service_door
           5: S_IFREG mode:0644 dev:210,1281 ino:744 uid:138660 gid:10 size:747398
              O_RDONLY|O_LARGEFILE
              /home/eschrock/.phoenix/default/7pkwqbju.slt/XUL.mfasl
           6: S_IFIFO mode:0000 dev:203,0 ino:119094 uid:138660 gid:10 size:0
              O_RDWR|O_NONBLOCK
           7: S_IFIFO mode:0000 dev:203,0 ino:119094 uid:138660 gid:10 size:0
              O_RDWR|O_NONBLOCK
        [ ... output elided ... ]
        $ 

This should be enough to get most savvy sysadmins drooling. But wait, there's more!. This feature allowed the new DTrace io provider (integrated into build 60, aka Beta 5, aka SX 07/04) to get path name information for arbitrary files in the system. This allows you to do neat stuff like:

        # cat iohog.d
        #!/usr/sbin/dtrace -s
        
        io:::start
        {
                @[execname, args[2]->fi_pathname] = sum(args[0]->b_bcount);
        }
        # ./iohog.d
        ^C
        
          sched           /home/eschrock/.dt/sessionlogs/machine_DISPLAY=:0      4096
          xlp             /var/adm/utmpx                                         4096
          fsflush         /export/iso/solaris_4.iso                              73728
          sched           <none>                                                 82432
          cp              <none>                                                 114688
          fsflush         <none>                                                 177152
          cp              /export/iso/solaris_4.iso                              238936064
          cp              /export/iso/solaris_1.iso                              239910912
        #

For years we've had the iostat(1M) utility. It's great to know that someone is hammering away on sd0, but that's not really the question you want answered. What you really want to know is who is hammering away on your disks. With the DTrace io provider, we've taken it one step further by giving you the means to answer why someone is hammering away on your disks. All of a sudden one of the most opaque problems is now completely transparent. So head on over and check it out (while the io provider is not available in Solaris Express quite yet, the documentation for it is available on the DTrace page).


1 For the curious: Solaris implements a Virtual File System (VFS) layer, which includes the notion of a vnode to represent an abitrary file. The filesystem-dependent part is stored in a format private to the filesystem implementation (think of it in terms of inheritence if it helps). To illustrate with crude ASCII art:

 USERLAND        KERNEL VFS                         KERNEL FS

   fd ----+----> file_t -----+----> vnode_t ------> inode_t /
          |                  |                      prnode_t /
   fd ----+                  |                      etc
                             |
   fd ---------> file_t -----+
We store a (char *) pointer at the end of the vnode_t when we go to look up the file, and now we have path information for all the open files in the kernel (even those implicitly mapped into process address space, without an associated file_t). There are some subtleties with hard links and moving files around, but it works perfectly 99% of the time, which is all we can hope for in this case.