
Friday March 24, 2006
libaio hang ??
libaio2.html
Recently I came across a core file from an application linked with
libaio. Customer said the application hung and he used gcore(1) to
collect the core file.
After initial investigation it turned out that an mmap() call from
libaio failed as there is not enough virtual memory for the process
left. libaio issues mmap() to set aside stack for user land threads
that it creates which service the 'async IO' requests.
As it is a 32bit process, I assumed, at one point the process may have
grown close to 4Gb and mmap() call around that time failed with 'no
memory'.
With these points, we put together a test case whch calls malloc() till
a failure happens, and then issue an aio call which results in mmap()
failing. So we throught the application wass hitting the 32bit limit.
One smart customer facing engineer collected the output of 'ulimit -a'
and there it showed that the 'vmemory' was limited to ~600Mb.
Interestingly size of the core filed collected was also around the same.
So the recommendation was to set it to 'unlimited'; if not, atleast
raise it to a signifcant number.
But the point to keep in mind, while setting to 'unlimited' is, it
needs to be ensured that there is enough swap
space configured on the system. Otherwise a run away process can starve
others.
As the problem happened on s9, I can't print the stack here.
( Mar 24 2006, 02:34:35 AM PST )
Permalink