Frank Hofmann's Weblog Frank Hofmann's Weblog

Monday Oct 08, 2007

to answer comments ...

huh, that was the first blog of mine attracting comments actually ... so I'll try to answer them as well as I may.

  • How about stack growth direction ?

    Well, for one thing, the method I presented isn't portable because stack_inbounds() is a Solaris-specific interface. So if you'd like to use that, you can reasonably assume that the stack grows downward, as the program used for illustration did.

    Second, even assuming that either stack_inbounds() and friends ever get ported to operating environments other than Solaris, or else that Solaris ever gets ported to hardware platforms / ABIs whose stacks grow upwards, and a truly portable approach is needed, then the following strategy can be used. The approach is outlined e.g. in the 'USAGE' section of sigstack(3C). One can test for stack growth direction e.g. by passing the address of a local variable as a parameter to a function, who compares it with the address of a local variable of its own and returns true/false depending on whether the latter is larger/smaller than the former.

    Or, for a more portable approach that would not require knowledge of addresses of local variables (a notoriously difficult thing given that optimizing compilers can have very interesting strategies for dealing with seemingly 'unneeded' local variables), one could use something like:
    #include <ucontext.h>
    #include <sys/types.h>
    
    int can_alloca(size_t sz) {
      char *buf = alloca(sz);
      return (stack_inbounds(buf) && stack_inbounds(&buf[sz - 1]));
    }
    
    void myfunc_that_does_alloca(...) {
      int use_alloca = can_alloca(sz);
    
      if (use_alloca)
        buf = alloca(sz);
      else
        buf = malloc(sz);
    [ ... ]
      if (!use_alloca)
        free (buf);
    }
    which is slightly more expensive than the first example shown, but might still be more lightweight than using malloc() since definitely no syscalls are involved and no "malloc bookkeeping" is needed.

    But then, I didn't claim the approach is for everyone, and neither did I suggest to prefer alloca() over malloc(). There are some pretty good malloc implementations out there - even in Solaris, you have the choice of several, and benchmarks often select a specific in order to achieve best results.

    But I'm disgressing now - differences between the various malloc libraries should better be covered in another blog entry ... one day ...

    Finally - before I forget, try "I'm feeling lucky" on a Google search for "stack growth direction".

  • What's the advantage to C99 vararrays ?

    C99 VLAs and alloca() are pretty much equivalent, and that goes both for advantages and disadvantages. There is nothing in the C99 standard that guarantees you success on declaring a VLA, or respectively, for all I know, it's not defined what happens if the stack cannot be extended to accommodate the requested VLA size. Which is exactly the behaviour of alloca() as well. Hence, if it's important / critical to know that the C99 function using variable-length arrays will never crash (due to attempting to use a too-large VLA), the same approach, testing whether the addresses of all elements of the VLA are within the stack boundaries, would suffice.

    Regarding variable length arrays and C/C++ programming, this paper by Dennis Ritchie and this discussion on CodeGurus also make interesting reads. Some things are just notoriously hard ...

    To put it short: C99 VLAs and alloca() are, in my opinion, very close cousins and share common (dis)advantages. That includes the inability to check for 'success' in simple ways, and the applicability of the shown strategy to achieveit nonetheless.

And now - I need to find out how to actually post replies to comments without needing to write another blog article. You, the readers, might eventually turn this "blogophobe me" into a real blogger - who'd have thought that ...

Thursday Oct 04, 2007

About alloca()

The C programming language offers a nifty feature for temporary memory allocation - alloca(3C). The advantage of alloca() over memory allocation via the heap is compelling:

Speed
No need to perform expensive list/tree searches in heap management structures
No need to execute expensive sbrk() syscalls to grow the heap
No need to perform locking in MT applications
Simplicity
No need to perform garbage collection - automatic free on function exit

Typical problems resulting from the use of alloca()

Unfortunately, alloca() has one critical flaw - it cannot fail. Or rather, it will not fail gracefully. It always returns a buffer address. If the buffer was too large, stack memory runs out and nasty things will happen to your application:

  • The attempt to access the buffer will, at some offset into the buffer - crash.
  • Making another function call will end up writing data onto the stack - crash.
  • Since other local variables of the function having done alloca() might've been relocated, they possibly are no longer backed by stack memory; using any of these locals then means - crash.

From a diagnosability point of view, the crash will simply be a SIGSEGV or SIGBUS (invalid address / access to address that isn't mapped), and it will not necessarily happen close to the location in the code where alloca() was called. To complicate things further, the crash may be reproducible in one environment, while it may not occur in another. The reason for this is because default stacksizes are dependent on the user environment (see ulimit(1)), and/or because stacksizes for threads in a multithreaded application may differ (from the process' main stacksize and/or from each other's stacksize).

How to make alloca() safe to use

Well, in the past, and on many current operating systems, the answer is "you can't", or at best "it's so expensive to do it that you'd better use malloc()". This is so because your function would have to query 'its' stack boundaries before deciding whether to use alloca() or malloc(), and doing so involves some very unportable, low-level hacking (like setting a temporary signal handler and attempting to read a byte roughtly from the area where the buffer would end up, and checking whether that'd trigger a fault), and/or expensive calls to system-level monitoring interfaces (the use of procfs to find the limits of the mapping that e.g. a local variable address is found in). It's a very expensive operation, and hence defeats the purpose of alloca().

Solaris 10 to the rescue ! There are a few new interfaces in Solaris 10 which allow to do this check quickly and efficiently. The following test programm illustrates that:

#include 
#include 
#include 
#include 
#include 

#define RESERVE 8192

int main(int argc, char **argv)
{
	int sz = atoi(argv[1]);
	char *buf;
	int done_alloca = 0;

	if (!stack_inbounds((char *)&sz - (sz + RESERVE))) {
		if ((buf = malloc(sz)) == NULL) {
			perror("too large for alloca(), and malloc() failed");
			return(1);
		}
	} else {
		buf = alloca(sz);
		done_alloca = 1;
	}

	bzero(buf, sz);

	printf("got %u bytes, at addr %p, via %s()\n",
	    sz, buf, done_alloca ? "alloca" : "malloc");

	if (!done_alloca)
		free(buf);

	return(0);
}

As you can see, it's pretty simple on Solaris 10 to check whether there's enough space on the thread stack to successfully do alloca() or not. And since stack_inbounds() does not perform any system calls, but only inspects the thread context, it's pretty fast as well.

The feature is already used in OpenSolaris - check the code in usr/src/cmd/ldapcachemgr/cachemgr.c how it allows to gracefully deal with previously impossible-to-address stack overflow issues.

Further reading

Check the Jim Litchfield's blog for further details on stack boundary queries.

Courtesy also to my colleague I.Zymin, who talked me into evaluating this feature.