
Friday December 08, 2006
Enuf error checking done in your code ??
Enuf error checking done in your code ??
I remember having read somewhere sometime back, that, more than 20% of
the solaris code just does error checking ...and my first reaction was,
wow thats lot and close to paranoia ...and with the passage of time I
kind of forgot about it.
Couple of weeks back I got an escalation and, as usual, it started off
as, 3rd party application core dumped inside a solaris library (and to
be more specific in libaio on solaris10).
libaio allows applications to do IO in an asynchronous way on files
that can not make use of kernel async IO (for ex: aioread()/aiowrite()
on a device file make use of kernel aio; whereas the same call on a
regular file will make use of aio features provided the library) - How
library does aio is very simple: it creates a few worker threads which
will pick up requests and perform IO in the background using pread()
and pwrite() calls.
So when an application issues a aio read or write for the first time on
a file descriptor (fd) that can't make use of kaio, library creates a
bunch of worker threads to execute the aio requests. Refer to the
routine
__uaio_init()
to see how the initialization is happening.
This post is concerned more with the following few lines of code from
__uaio_init() :
194 /*
195 * Create the minimum number of read/write workers.
196 */
197 for (i = 0; i < _min_workers; i++)
198 (void) _aio_create_worker(NULL, AIOREAD);
As the comment says, it is creating a bunch of worker threads. But what
is interesting is, we are not checking for the return value of
_aio_create_worker() routine - This routine creates a user land thread
[From solaris9 onwards, libthread moved to 1:1 model - which
means, for every user land thread created, there will be an lwp in the
kernel to which it is tied to].
For some reason if this call fails, we continue as if everything is
hunky-dory and set __uaio_ok which means initialization is complete.
Down the line when it actually tries to take off a worker from the Q,
accesses a NULL pointer and thus shoots itself in the foot.
That makes me think that there is never enough error checking !! more
is always merrier here...
( Dec 08 2006, 02:54:49 AM PST )
Permalink