Eddies in the Space-Time Continuum
Peter Harvey's blog
Recent entries
Archives
« November 2009
MonTueWedThuFriSatSun
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
      
Today
Click me to subscribe
Search

Twitter

follow peteh at http://twitter.com
Google Reader
Technorati
My tags
aliases announcement birth book brompton children cloud computer croudsourcing damage datamining decision dtrace ethics family fun futurology gapminder geros goalfree goals happiness home human incentives intel internet ipod junk kepner-tregoe knowledge life loft losug mail making management mdb meme methodologies metro mice modelling morals mrgum network nfs nis+ nis_cachemgr nscd opensolaris performance philosophy photo pizza poverty process programming psychology rands review rodents rules security seedcamp sgrt sigbus signals sigsegv smart smf solaris sprcfb squirrels sun sunray superhero system ted teleworking terminal terminfo thinking troubleshooting trust wisdom
Flickr
www.flickr.com
This is a Flickr badge showing public photos from peteh. Make your own badge here.
del.icio.us links
del.icio.us tags
Bookmarks
RSS Feeds
XML
All
/Family
/General
/Internet
/Links
/Mac
/Management
/Palm
/Security
/Solaris
/Sun
Links
 
Licence
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.

Today's Page Hits: 20

Wednesday Feb 28, 2007
Process exit status explored: SMF and what does shell's $? represent?

Introduction

All UNIX shells I've seen allow access to the exit value of the last command through the $? macro. The ksh(1) man page neatly states:

    The value of a simple-command is its exit status if it terminates
    normally. If it terminates abnormally due to receipt of a signal,
    the value is the signal number plus 128.  See signal.h(3HEAD) for
    a list of signal values. Obviously, normal exit status values 129
    to 255 cannot be distinguished from abnormal exit caused by
    receiving signal numbers 1 to 127.

How is exit status set? And how much can we derive from it? The reason I asked this question was I was trying to understand SMF's handling of process exit status.

SMF digression

I was looking at the SMF method failure messages. There are three method failed error paths in usr/src/cmd/svc/startd/method.c method_run():

signal"%s: Method \"%s\" failed due to signal %s.\n"
exit()"%s: Method \"%s\" failed with exit status %d.\n"
other"%s: Method \"%s\" failed with exit status %d.\n"

What I was confused about is under what conditions the third error path can ever be taken. Note the ambiguity in that the last two error messages are identical.

Exit status

Exit status is documented in the wait(3c) man page. This is what we have:

      In the  following,  status  is  the  object  pointed  to  by
      stat_loc:

        o  If the child process terminated due to an _exit() call,
           the  low  order 8 bits of status will be 0 and the high
           order 8 bits will contain the low order 7 bits  of  the
           argument  that the child process passed to _exit(); see
           exit(2).

        o  If the child process terminated due to  a  signal,  the
           high order 8 bits of status will be 0 and the low order
           7bits will contain the number of the signal that caused
           the  termination.  In  addition, if  WCOREFLG is set, a
           "core   image"   will   have   been    produced;    see
           signal.h(3HEAD) and wait.h(3HEAD).

In other words:

If lower 8 bits are zero, we called exit() or fell off the end of main(), exit status in the top 8 bits.

If lower 8 bits are not zero and the upper 8 bits are zero, we took a signal, the top bit (WCOREFLG) is set in the lower 8 bit we also core dumped. The signal taken is in the lower 7 bits.

What this doesn't document is when the lower 8 bits are non-zero and the upper 8 bits are non-zero. The exit status is set by wstat():

/*
 * convert code/data pair into old style wait status
 */
int
wstat(int code, int data)
{
    int stat = (data & 0377);

    switch (code) {
    case CLD_EXITED:
        stat <<= 8;
        break;
    case CLD_DUMPED:
        stat |= WCOREFLG;
        break;
    case CLD_KILLED:
        break;
    case CLD_TRAPPED:
    case CLD_STOPPED:
        stat <<= 8;
        stat |= WSTOPFLG;
        break;
    case CLD_CONTINUED:
        stat = WCONTFLG;
        break;
    default:
        cmn_err(CE_PANIC, "wstat: bad code");
        /* NOTREACHED */
    }
    return (stat);
}

If the lower 8 bits and the upper 8 bits are non-zero it looks like a CLD_STOPPED or CLD_TRAPPED. Reading the source further the upper 8 bits will be the signal that caused it.

What you also have to bear in mind is that applications interpret the status returned from wait(3c). For ksh(1) this is as documented above, but what of SMF?

SMF handling of exit status

The SMF code looks like this (with comments from me):

   if (!WIFEXITED(ret_status)) {

       WIFEXITED tests for ((int)((stat)&0xFF) == 0)

       We didn't exit cleanly, so let's find out why

     if (WIFSIGNALED(ret_status)) {

       WIFSIGNALED does this test:
       ((int)((stat)&0xFF) > 0 && (int)((stat)&0xFF00) == 0) */

       We already know the first test is non-zero and, assuming
       I've not confused my types, ANDing 0xFF with a signed int
       will always be >=0 as we implicitly cast to the larger
       type (unsigned), AND with 0xFF and then explicitly cast
       back to signed. In other words, it's something of a no-op.

       The second test is more interesting as it relates to the
       overloading of the exit status. If the upper 8 bits are
       zero then it's a simple signal.

       Log: "%s: Method \"%s\" failed due to signal %s.\n"

       Signal is derived using WTERMSIG(), see
       wait.h(3head). Currently that's:

       #define	WTERMSIG(stat)		((int)((stat)&0x7F))

     } else {

       We can only reach this clause if we have something more
       complex than a signal, like the child stopping.

       Log: "%s: Method \"%s\" failed with exit status %d.\n"
       WEXITSTATUS(ret_status))

       The exit status is: ((int)(((stat)>>8)&0xFF))

       As noted above, I believe the value is the signal that
       that caused the CLD_STOPPED or CLD_TRAPPED.

     }
     ** Jump out **
   }

   Normal exit
   *exit_code = WEXITSTATUS(ret_status);
   if (*exit_code != 0) {

     Log: "%s: Method \"%s\" failed with exit status %d.\n"

   }

Conclusion

There's an ambiguity in the SMF method failed code in that we don't distinguish between a clean exit and the child being stopped or trapped.

As far as shells are concerned, the man pages need to be checked as regards the handling of the exit status and the resulting $? exit value presented by the shell.


Posted at 04:16PM Feb 28, 2007 by Peter Harvey in Solaris  |  Comments[0]  |  del.icio.us technorati digg