Weblog

All | General | Solaris
« One ASSERT per line... | Main | lx brand internals -... »
20061208 Friday December 08, 2006

Enuf error checking done in your code ?? Enuf error checking done in your code ?? I remember having read somewhere sometime back, that, more than 20% of the solaris code just does error checking ...and my first reaction was, wow thats lot and close to paranoia ...and with the passage of time I kind of forgot about it.

Couple of weeks back I got an escalation and, as usual, it started off as, 3rd party application core dumped inside a solaris library (and to be more specific in libaio on solaris10).

libaio allows applications to do IO in an asynchronous way on files that can not make use of kernel async IO (for ex: aioread()/aiowrite() on a device file make use of kernel aio; whereas the same call on a regular file will make use of aio features provided the library) - How library does aio is very simple: it creates a few worker threads which will pick up requests and perform IO in the background using pread() and pwrite() calls.

So when an application issues a aio read or write for the first time on a file descriptor (fd) that can't make use of kaio, library creates a bunch of worker threads to execute the aio requests. Refer to the routine __uaio_init() to see how the initialization is happening.

This post is concerned more with the following few lines of code from __uaio_init() :

    194 	/*
195 * Create the minimum number of read/write workers.
196 */

197 for (i = 0; i < _min_workers; i++)
198 (void) _aio_create_worker(NULL, AIOREAD);

As the comment says, it is creating a bunch of worker threads. But what is interesting is, we are not checking for the return value of _aio_create_worker() routine - This routine creates a user land thread [From solaris9 onwards,  libthread moved to 1:1 model - which means, for every user land thread created, there will be an lwp in the kernel to which it is tied to].

For some reason if this call fails, we continue as if everything is hunky-dory and set __uaio_ok which means initialization is complete. Down the line when it actually tries to take off a worker from the Q, accesses a NULL pointer and thus shoots itself in the foot.

That makes me think that there is never enough error checking !! more is always merrier here...
( Dec 08 2006, 02:54:49 AM PST ) Permalink

Comments:

Post a Comment:

Comments are closed for this entry.

Calendar

RSS Feeds

Search

Links

Navigation

Referers