Eddies in the Space-Time Continuum
Peter Harvey's blog
Recent entries
Archives
« February 2010
MonTueWedThuFriSatSun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
       
       
Today
Click me to subscribe
Search

Twitter

follow peteh at http://twitter.com
Google Reader
Technorati
My tags
aliases announcement birth book brompton children cloud computer croudsourcing damage datamining decision dtrace ethics family fun futurology gapminder geros goalfree goals happiness home human incentives intel internet ipod junk kepner-tregoe knowledge life loft losug mail making management mdb meme methodologies metro mice modelling morals mrgum network nfs nis+ nis_cachemgr nscd opensolaris performance philosophy photo pizza poverty process programming psychology rands review rodents rules security seedcamp sgrt sigbus signals sigsegv smart smf solaris sprcfb squirrels sun sunray superhero system ted teleworking terminal terminfo thinking troubleshooting trust wisdom
Flickr
www.flickr.com
This is a Flickr badge showing public photos from peteh. Make your own badge here.
del.icio.us links
del.icio.us tags
Bookmarks
RSS Feeds
XML
All
/Family
/General
/Internet
/Links
/Mac
/Management
/Palm
/Security
/Solaris
/Sun
Links
 
Licence
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.

Today's Page Hits: 466

« Compiling lsof 4.77... | Main | The Superhero test »
Friday Dec 08, 2006
SIGBUS versus SIGSEGV - according to siginfo.h(3head)

Having asked a number of colleagues I failed to find a consistent answer to the question of the differences between SIGBUS and SIGSEGV. According to the Solaris signal(3head) man page we have:

     Name             Value   Default     Event
     ...
     SIGBUS           10      Core        Bus Error
     SIGSEGV          11      Core        Segmentation Fault

So I dug a bit further and found that siginfo_t can tell you more about the origins of the signal, in particular we have, from the siginfo.h(3head) man page:

     Signal         Code                 Reason
     _________________________________________________________________________
     ...
     _________________________________________________________________________
     SIGSEGV        SEGV_MAPERR          address not mapped to object
                    SEGV_ACCERR          invalid permissions for mapped object
     _________________________________________________________________________
     SIGBUS         BUS_ADRALN           invalid address alignment
                    BUS_ADRERR           non-existent physical address
                    BUS_OBJERR           object specific hardware error
     _________________________________________________________________________

Obviously this may be open to interpretation but that clarifies a few things for me.

For the techie take a look at the OpenSolaris source code for the trap() function. Here we see the handling for various types of trap including page faults. For example, there's a section where a decision is made as to return SIGBUS or SIGSEGV:

	case T_WIN_OVERFLOW + T_USER:	/* window overflow in ??? */
	case T_WIN_UNDERFLOW + T_USER:	/* window underflow in ??? */
	case T_SYS_RTT_PAGE + T_USER:	/* window underflow in user_rtt */
	case T_INSTR_MMU_MISS + T_USER:	/* user instruction mmu miss */
	case T_DATA_MMU_MISS + T_USER:	/* user data mmu miss */
	case T_DATA_PROT + T_USER:	/* user data protection fault */
		switch (type) {
        ...
		/*
		 * In the case where both pagefault and grow fail,
		 * set the code to the value provided by pagefault.
		 */
		(void) instr_size(rp, &addr, rw);
		bzero(&siginfo, sizeof (siginfo));
		siginfo.si_addr = addr;
		if (FC_CODE(res) == FC_OBJERR) {
			siginfo.si_errno = FC_ERRNO(res);
			if (siginfo.si_errno != EINTR) {
				siginfo.si_signo = SIGBUS;
				siginfo.si_code = BUS_OBJERR;
				fault = FLTACCESS;
			}
		} else { /* FC_NOMAP || FC_PROT */
			siginfo.si_signo = SIGSEGV;
			siginfo.si_code = (res == FC_NOMAP) ?
				SEGV_MAPERR : SEGV_ACCERR;
			fault = FLTBOUNDS;
		}

I was digging around this following a discussion regarding bug 6466257 (mmap file writing fails on nfs3 client with EMC nas device) and the signals delivered by mmap(2). The man page suggests that either SIGBUS or SIGSEGV can be returned for a number of error conditions but doesn't seem sure which. The answer is, "it depends".

So my conclusion is sadly another question - can an application developer infer anything from a SIGBUS versus a SIGSEGV? The answer, I believe, is yes - but quite often the result is the same which is to fix the code :-)


Posted at 03:42PM Dec 08, 2006 by Peter Harvey in Solaris  |  Comments[3]  |  del.icio.us technorati digg

Comments:

Based on experience from many years ago (and not having access to source code and the time) I learned that SIGSEGV tended to mean that you either dereferenced a NULL pointer which would mean you are trying to access a non-existant segment or that you generated and were trying to use an address which was pointing into the "text" segment. In contrast SIGBUS basically meant that you were trying to use an address which was illegal (i.e. outside the ability of the manhine to address).

Posted by Kent Wilson on December 08, 2006 at 06:59 PM GMT #

On personal opinion, I find this very helpful. Guys, I have also posted some more relevant info further on this, not sure if you find it useful: http://www.bidmaxhost.com/forum/

Posted by alex on March 18, 2007 at 06:07 AM GMT #

On a recent SGR course I was teaching I was asked by one of the participants about this entry. IIRC he was questioning whether a SIGBUS could ever be generated by a programming problem.

The short answer is yes.

Check this code which has a mis-aligned pointer dereference:

#include <stdio.h>

int main(int argc, char **argv)
{
        int testvar = 0x12345678;
        int *testvarp;

        testvarp = &testvar;
        printf("testvarp was %lx\n", testvarp);
        printf("testvar is %lx\n", *testvarp);

        testvarp = (int *)(((char *)testvarp) + 1);
        printf("testvarp is %lx\n", testvarp);
        printf("testvar is %lx\n", *testvarp);

        return(0);
}

Compiling for SPARC v8 or v9 has different results:

$ cc -o sigbus-demo sigbus-demo.c                       
$ ./sigbus-demo
testvarp was ffbfebb4
testvar is 12345678
testvarp is ffbfebb5
testvar is 34567800
$ cc -xarch=v9 -o sigbus-demo sigbus-demo.c 
$ ./sigbus-demo                            
testvarp was ffffffff7fffe98c
testvar is 12345678
testvarp is ffffffff7fffe98d
zsh: bus error (core dumped)  ./sigbus-demo
$ 

So, a programming problem can cause a SIGBUS. How this all works in practise is an exercise I'm happy (for now) to leave to the reader.

Posted by Peter Harvey on March 30, 2007 at 12:58 PM BST #

Post a Comment:
  • HTML Syntax: NOT allowed