Musings on realtime The jel's weblog

Wednesday Aug 15, 2007

I've been producing a fair amount of source code recently and have become convinced that using two compilers during the process is well worth it and that the normal way that I set up conditionals is flawed.

Two compilers
I've been a Studio fan for all of my time at Sun (through all of the name changes), especially since that was all that the Solaris group used until OpenSolaris was released. Two recent events have made me decide that compiling and testing with gcc in addition to Studio is worth the effort.

While I was working on the code for the stack blog, I decided to try gcc and it flagged something that Studio did not. One of the interfaces I used was umem_cache_create whose first argument is a char *. I was building it from a structure which contained something such as this:

        struct foo {
                ...
                char *name;
                ...
        }


initialized this way:

        struct foo names[] = {
                { ..., "32 KB stack cache",... },
        };

gcc caught the possibility that a compiler might choose to put "32 KB stack cache" into a read-only section of the program and that a potentially read-only datum was being passed to a routine that was not guaranteeing to preserve the data. As I understand the standards, umem_cache_debug() should use "const char *" to indicate that the data passed in won't be changed.

The previous reason is a bit esoteric but the example that really turned my head was when I was overflowing a buffer. How I did it doesn't really matter but a buffer on the stack was overflowed. With Studio's data allocations, I never saw any side effects from doing so. Everything seemed to work just fine with my test cases. When I compiled with gcc, it laid out the data differently and I was able to see that something was going wrong, find the problem and fix it. gcc never told me directly that I was overflowing a buffer but the different data layouts did allow me to see the problem.

Whether you prefer gcc or the Studio compilers, it's worth your time to compile and do testing with the other compiler. You may find things that you never suspected in that code you thought was working perfectly.

In gmake, it's quite trivial to set up your make file to use a different compiler. I use something such as this:

CC = cc
OBJECTS = stacks.o
CFLAGS = -g -xO3 -K pic -I. $(XCFLAGS)
LDFLAGS = -znodefs -G -o $(REALLIB) -h $(LNTARGET) -lumem
ifeq ($(CC),gcc)
        CFLAGS=-g -O2 -fpic -std=c99 -pedantic -Wall
        LDFLAGS=-shared -fpic -o $(REALLIB) -lumem -lc
endif


This uses Studio by default but I can say gmake CC=gcc and build with gcc.

I haven't figured out how to do this with make yet.

Conditionals

I've always structured my conditional tests of the form:

        if (rv == 0) ...

I've now been bitten a few too many times by

        if (rv = 0) ...

which compiles just fine but has interesting results.

I've now adopted a style I first saw in the Windows world:

        if (0x0 == rv) ...

If you mess this up by forgetting one of the '=' signs, the compiler now flags the error for you. gcc's version:

        error: invalid lvalue in assignment

It looks a bit strange if you've had the classic K+R wired into your finger tips but it does make life much, much easier.

I do see that gcc with a -Wall will delicately tell you

        suggest parentheses around assignment used as truth value

which may help to avoid the issue, although I much prefer lint's message:

        warning: assignment operator "=" found where "==" was expected

Studio just lets you do it and gcc without the magic flag to catch it will let you do it as well. lint flags the error but it seems not to be used as much during development and test as it should. I certainly don't run it every time I compile.


Monday Aug 06, 2007

My most recent encomium discussed the subject of stacks and how they were used for programs and threads.

As a follow on to that, I'd like to present a library to allocate thread stacks that avails itself of the facilities of libumem(3LIB).

There are several nice features of libumem discussed in the man page. The one we are directly using is umem_caches. The advantage of using these routines is when the data allocated requires various initialization steps before it can be used. Examples would be creation of a mutex or other synchronization primitive, initialization of fields within a structure or, in our case, setting permissions on pages within the allocated buffer.  If these initialization efforts recur frequently, it can be more effective to use a umem_cache to initialize a data buffer once, use the buffer, return it to its original state (i.e., as initialized) and then to free it back to a cache for that structure.  The next element of code that needs to use such a buffer can request an allocation and if a previously initialized buffer is available, can use that buffer without having to bear the burden of redoing the initialization. Our sample library uses umem_cache_create(3MALLOC), umem_cache_alloc(3MALLOC) and umem_cache_free(3MALLOC).

The library allocates page aligned buffers in sizes that are multiples of the system page size and, for our purposes, start with a size of 32KB and moves in a power-of-two progression to 2MB (32KB -> 64KB -> 128KB -> 256 KB -> 512 KB -> 1 MB -> 2 MB). Only buffers of one of those sizes can be allocated. Each buffer is bracketed by a page of memory with PROT_NONE access permissions so that any reference to an address in those pages (stack underflow or overflow) will cause a segmentation violation. Each buffer is fully faulted when first allocated so that if an mlockall(3C) including the MCL_FUTURE flag was previously used, the buffer will be locked into memory before first use by the thread.

libumem is useful in that it does the caching for us instead of requiring that we write code to
manage the caches.

Three interfaces are offered:

  • void *stack_alloc(size_t size)
  • int stack_free(void *address, size)
  • uint_t *fetch_stack_sizes(void) - returns an array of integers with supported page sizes. The end of the array is a zero valued element. The array may be freed when use of it is ended

The complete source for the library can be found in this (modified) tar file.

umem debugging
I run every program with these environment variables set (on Intel/AMD):

UMEM_DEBUG=audit=64,guard
LD_PRELOAD_32=/lib/libumem.so.1
LD_PRELOAD_64=/lib/amd64/libumem.so.1

I even add these at the end of /etc/profile on the system.

This uses the facilities of libumem to find usage errors with memory allocations in programs. One can also use gcore to grab a core file of a running program and then use mdb's ::findleaks options to find memory leaks within programs. The '64' in the audit entry of UMEM_DEBUG indicates how many stack frames libumem should be prepared to save when tracking allocations and where they are made. It's a depressing exercise to set up firefox or thunderbird to run under the control of these variables and then to examine the core files with mdb.

In addition to finding memory leaks, libumem is good for finding coding errors. Consider this code:

        alloc_t *na;
        na = (alloc_t *)malloc(sizeof(na));

Note: an earlier version of this posting used "*na" in the malloc call - hence the comments.

alloc_t happens to be a struct with a size of 8 bytes but I'm allocating a pointer to it whose size is only 4 bytes. The code, however, treats the allocated space as a struct and sets 8 bytes worth of data, writing beyond the end of what I allocated. When I free this allocated address, libumem forcibly tells me (via a core dump) that I've gone beyond the end of the data that I allocated. This helped point me to an error that it might otherwise have taken a lot of time to find (if it ever did show up).

The corrected code is:

        alloc_t *na;
        na = (alloc_t *)malloc(sizeof(alloc_t));

Seriously consider running all of your programs with these flags while developing and testing even for deployment.