Memory corruption incidents might be tough
to handle. Solaris' libumem (an alternative allocation library)
offers a debugging facility, which is useful for memory inquiring,
when trying to detect leaking or corruption. Using libumem is anyway
recommended when running a multi-threaded application (especially on
a multi-processors or other multi-threaded architecture), and might
also be used just for the sake of debugging.
There are good documents describing this
facility in details (see below). I will try to give here a quick
starting guide.
MDB (modular debugger) is used as the
front-end tool to retrieve debugging data from what libumem was
collecting. It work with a core file, so one should be generated
(automatically or manually with gcore). In order to start working
with the basic (and powerful) features, you should:
-
pre-load libumem library and set
environment variables for debugging
-
get basic familiarity with the libumem
buffer structure in debug mode
-
be familiar with a few mdb commands
Here is a short description of those 3
items:
Pre-loading and Environment Variables
Have these settings active when and where
you are running your application:
Pre-load libumem:
export LD_PRELOAD=libumem.so.1
(or setenv
LD_PRELOAD libumem.so.1 in csh)
Define UMEM_DEBUG and UMEM_LOGGING, like
(ksh/bash):
export UMEM_DEBUG=default
(or setenv
UMEM_DEBUG default in
csh)
export UMEM_LOGGING=transaction
(or setenv
UMEM_LOGGING transaction in
csh)
Buffer Structure (when using debug)
Libumem uses memory caches, each contains a
set of buffers of a pre-defined size. Thus, there might be one cache
for 16 bytes buffers, another one for 512 bytes, etc. Each allocated
buffer is structured this way:
|
Metadata (8 bytes)
|
User Data
|
Redzone
(8 bytes)
|
Debug metadata
(8 bytes)
|
|
|
|
|
|
|
|
|
The first 8 bytes metadata are ignored
here, we are interested in the user data,
redzone and debug
metadata segments.
Zooming in to these segments structure:
|
User Data
|
|
Application available memory
(uninitialized memory is set to 0xbaddcafe)
|
'0xbb', denotes end of application buffer
|
Rest of the allocated buffer
(uninitialized memory is set to
0xbaddcafe)
|
-
'0xbaddcafe' value is written to all
uninitialized memory of the user data segment.
|
Redzone
|
|
Value of '0xfeedface' (4 bytes)
|
Integer value (4 bytes) from which the application allocation
size can be calculated
|
-
The application allocation size is
calculated from the last 4 bytes of the redzone (let's denote their
decimal integer by x): allocation-size = ((x –
1) / 251) - 8
|
Debug metadata
|
|
Pointer to umem_bufctl_audit
structure (4 bytes)
|
Checksum value (4 bytes)
|
-
We 'll see in a minute that the
umem_bufctl_audit structure, which includes the stack trace of the
allocation, can be dumped inside mdb
-
XORíng the pointer to
umem_bufctl_audit (first 4 bytes) with the checksum value should
result in the value of 0xa110c8ed.
If not, this segment is probably corrupted.
A Few MDB Commands to start with and references to
examples
Invoke mdb on a core file, simply by:
# mdb core-file
Within the mdb prompt, you might:
scan allocated buffers for potential out of boundary writes:
> ::umem_verify
You will get a list like:
...
umem_alloc_64 2e608 clean
umem_alloc_80 2e808 1 corrupt buffer
...
note that “_64” or “_80” are the sizes of
the user data described before. Use the address in the following
column for the next step.
You can then run ::umem_verify on a the specific cache:
> address::umem_verify
The latter will give you addresses of
the corrupted buffer. Dump the amount of bytes you need in order to
get to the bufctl_audit structure:
> buffer-address/numberOfBytesX
(i.e., > 37f88/90X)
Match the buffer structure (explained
before) with the dumped data, and retrieve the pointer to the
bufctl_audit structure. Then run
> bufctl_audit-ptr::bufctl_audit
And if the debug data is not
corrupted, you will get the buffer information, including the
allocation stack trace.
See an example here,
look for 'Traditional Memory Corruption'
still on out of boundary writes
Sometimes the allocation stack is not sufficient. To generate a
core immediately after such a malicious write occurs, you might try
to use a hidden feature, but with performance impact and memory
overhead, so it probably will not fit all cases.
Set UMEM_DEBUG="firewall=1"
UMEM_OPTIONS="backend=mmap" and run your application.
check memory status
> ::umem_status
This will help you to detect modify-after-read incidents. See here
See Also
Identifying
Memory Management Bugs Within Applications Using the libumem
Library
Using
libumem to detect modify-after-free corruptions
Using
libumem to detect write-beyond-what-you-allocate errors
http://blogs.sun.com/jwadams/entry/debugging_with_libumem_and_mdb
mdb/kmdb,
libumem (pdf)