Thursday June 23, 2005
alanc @ sun.com
Alan Coopersmith’s blog
Random thoughts of a disorganized mind...
(and though it should be obvious, while Sun pays me to think about things, they disclaim any responsibility for these thoughts, nor do I claim what I say matches in any way what Sun thinks)
dtracing down an X server "memory leak"
A bug came into our group today complaining of a memory leak in X. The customer and the support engineer they'd worked with had actually done a good job of narrowing down a simple repeatable test case:
#!/bin/sh
while : ; do
/usr/openwin/bin/xterm &
sleep 1
kill -KILL `pgrep xterm`
done
They had also run Xsun with libumem preloaded and had captured the stack trace
that was making repeated allocations and not freeing them:
After 7 minutes:
> ::umausers
1040384 bytes for 254 allocations with data size 4096:
libumem.so.1`umem_cache_alloc+0x13c
libumem.so.1`umem_alloc+0x44
libumem.so.1`malloc+0x2c
Xalloc+0x18
MapAlloc+0x14
libcfb.so.1`cfbCreatePixmap+0x74
ProcCreatePixmap+0x118
Dispatch+0x17c
main+0x788
_start+0x108
After 11 minutes:
> ::umausers
3506176 bytes for 856 allocations with data size 4096:
libumem.so.1`umem_cache_alloc+0x13c
libumem.so.1`umem_alloc+0x44
libumem.so.1`malloc+0x2c
Xalloc+0x18
MapAlloc+0x14
libcfb.so.1`cfbCreatePixmap+0x74
ProcCreatePixmap+0x118
Dispatch+0x17c
main+0x788
_start+0x108
One of the other support engineers noticed the bug was reproducible only under CDE and not in the GNOME desktop. At that point they sent the bug to our group to investigate further.
I just happened to be working on a new toy at the time the bug came in: an Xserver provider for dtrace. (I've actually been working on this for a while, and am getting it ready to release for people to experiment with - watch this blog for news on that soon.) This seemed the perfect bug to put it to the test with.
Since it had already been narrowed down to CreatePixmap, the
next question was who was calling CreatePixmap and which ones
were freed afterwards. Unfortunately, there's about a half dozen different
ways in the X server to free a resource like a pixmap, and not all of them
involve X requests - for instance, when an X client connection is closed,
the X server garbage collects the resources that were specific to that client.
So what I really wanted was to trace pixmap allocations and frees and match
those to the clients calling CreatePixmap.
This actually ended up using a good cross section of the initial set of
dtrace probes I'd created in the X server.
First, to record who was making calls, I used the client-connect, client-authenticate, and client-disconnect probes. You can probably guess what each of those probes do. For connect & disconnect I simply printed that the event happened and which client id was involved. (In the X server all clients are tracked by a simple integer id, which is usually the position of the entry in the array of client information pointers that has the details for that client.) At client authenticate time, when we actually figure out what client is connecting, I also printed the string representing the address of the client and the pid of the client for local clients. This looks like:
Xserver$1:::client-connect
{
printf("** Client Connect: id %d\n", arg0);
}
Xserver$1:::client-auth
{
printf("** Client auth'ed: id %d => %s pid %d\n",
arg0, copyinstr(arg1), arg2);
}
Xserver$1:::client-disconnect
{
printf("** Client Disconnect: id %d\n", arg0);
}
(The $1 refers to the first argument to the dtrace script when
it's run, which I was using for the process id of the Xserver I wanted to
trace. Unfortunately, since the X server provider is a User-Defined Dtrace
Provider, the available data for each probe gets generic names, hence the
arg0, arg1, etc. I'll provide a chart of what
those really mean when I get the provider released.)
Next I needed to record when pixmaps were created and freed, so I used the resource-alloc and resource-free probes, and printed the XID for each alloc & free of a resource of type PIXMAP:
Xserver$1:::resource-alloc
/Xrestype[arg3] == "PIXMAP"/
{
printf("** Pixmap alloc: %08x\n", arg0);
}
Xserver$1:::resource-free
/Xrestype[arg3] == "PIXMAP"/
{
printf("** Pixmap free: %08x\n", arg0);
}
Then I added probes for the requests coming in to the Xserver from clients,
and whenever one came in for CreatePixmap, print when it started
and finished, and which client it was from, so you could see which pixmap
allocs came from those requests:
Xserver$1:::request-start
/Xrequest[arg0] == "X_CreatePixmap"/
{
printf("-> %s: client %d\n", Xrequest[arg0], arg3);
}
Xserver$1:::request-done
/Xrequest[arg0] == "X_CreatePixmap"/
{
printf("<- %s: client %d\n", Xrequest[arg0], arg3);
}
Put it all together, with a little bit of extra glue to cache the strings being copied through the kernel for better performance and reduced overhead, and the final script looks like this:
#!/usr/sbin/dtrace -qs
string Xrequest[uintptr_t];
string Xrestype[uintptr_t];
Xserver$1:::request-start
/Xrequest[arg0] == ""/
{
Xrequest[arg0] = copyinstr(arg0);
}
Xserver$1:::resource-alloc
/arg3 != 0 && Xrestype[arg3] == ""/
{
Xrestype[arg3] = copyinstr(arg3);
}
Xserver$1:::request-start
/Xrequest[arg0] == "X_CreatePixmap"/
{
printf("-> %s: client %d\n", Xrequest[arg0], arg3);
}
Xserver$1:::request-done
/Xrequest[arg0] == "X_CreatePixmap"/
{
printf("<- %s: client %d\n", Xrequest[arg0], arg3);
}
Xserver$1:::resource-alloc
/Xrestype[arg3] == "PIXMAP"/
{
printf("** Pixmap alloc: %08x\n", arg0);
}
Xserver$1:::resource-free
/Xrestype[arg3] == "PIXMAP"/
{
printf("** Pixmap free: %08x\n", arg0);
}
Xserver$1:::client-connect
{
printf("** Client Connect: id %d\n", arg0);
}
Xserver$1:::client-auth
{
printf("** Client auth'ed: id %d => %s pid %d\n",
arg0, copyinstr(arg1), arg2);
}
Xserver$1:::client-disconnect
{
printf("** Client Disconnect: id %d\n", arg0);
}
So I sent my test machine back to the dtlogin screen, started the script with the pid of the running X server and logged into CDE. I ran a few iterations of the xterm loop and found this pattern repeated for each iteration:
** Client Connect: id 17 ** Client auth'ed: id 17 => local host pid 20273 -> X_CreatePixmap: client 17 ** Pixmap alloc: 02200009 <- X_CreatePixmap: client 17 -> X_CreatePixmap: client 15 ** Pixmap alloc: 01e00180 <- X_CreatePixmap: client 15 -> X_CreatePixmap: client 15 ** Pixmap alloc: 01e00181 <- X_CreatePixmap: client 15 -> X_CreatePixmap: client 14 ** Pixmap alloc: 01c004c8 <- X_CreatePixmap: client 14 ** Pixmap free: 02200009 ** Client Disconnect: id 17 ** Pixmap free: 01e00180 ** Pixmap free: 01e00181
As you can see, the xterm was assigned client id 17 - it created a single pixmap itself which was freed, client id 15 created and freed two more, but for every xterm started, client id 14 created one that wasn't freed. Looking back at the beginning of the log to see who client id 14 and 15 were revealed they were two connections from the same process:
** Client auth'ed: id 14 => local host pid 20089 ** Client auth'ed: id 15 => local host pid 20089
Not surprisingly, ps showed that pid 20089 was the CDE window manager, dtwm. At this point I could have just punted the bug to our CDE team, but since I had this all set up, I decided to go one level further. So far, nothing I'd done with dtrace couldn't have been done by rebuilding the Xserver with some carefully selected printf()'s added - of course, being able to do it without having to rebuild the X server is very nice for those who aren't X server engineers and don't rebuild the X server several thousand times a year. But the true power of dtrace is that it's a whole system tracing tool, so I can see what's going on in the Xserver & dtwm processes at the same time.
My first attempt was to simply use the dtrace system() call to
call pstack on the dtwm pid every time we got a CreatePixmap call
from it - unfortunately, XCreatePixmap() is an asynchronous call,
and gets buffered up and sent later, so all that showed is that by the time
the X server got the CreatePixmap call is that dtwm had finished
sending requests and was waiting in the X event loop for new events to come in.
So I cheated. Since I knew the function in libX11 dtwm would be calling was
XCreatePixmap, I simply traced calls to it from the process id
(which I made a command line argument to the script):
pid$4::XCreatePixmap:entry
{
ustack();
}
I also changed the script to only print the CreatePixmap requests
from the two client ids representing dtwm:
Xserver$1:::request-start
/(arg3 == $2 || arg3 == $3) && Xrequest[arg0] == "X_CreatePixmap"/
{
printf("-> %s: client %d\n", Xrequest[arg0], arg3);
}
This expanded the detail in the output to show where dtwm was creating the pixmaps:
** Client Connect: id 19
** Client auth'ed: id 19 => local host pid 20497
** Pixmap alloc: 02600009
libX11.so.4`XCreatePixmap
libXm.so.4`_XmGetScaledPixmap+0x2fc
libXm.so.4`Xm21GetPixmapByDepth+0x6b
libXm.so.4`Xm21GetPixmap+0x31
dtwm`MakeNamedIconPixmap+0xc4
dtwm`ProcessWmHints+0x310
dtwm`GetClientInfo+0x299
dtwm`ManageWindow+0x18
dtwm`WmDispatchWsEvent+0x6e
dtwm`main+0x1f1
dtwm`0x809772a
libX11.so.4`XCreatePixmap
libXm.so.4`_XmGetScaledPixmap+0x2fc
libXm.so.4`Xm21GetPixmapByDepth+0x6b
libXm.so.4`Xme21GetMask+0x43
libXm.so.4`XmeGetMask+0x1b
libDtSvc.so.1`_DtGetMask+0x45
dtwm`MakeNamedIconPixmap+0xf2
dtwm`ProcessWmHints+0x310
dtwm`GetClientInfo+0x299
dtwm`ManageWindow+0x18
dtwm`WmDispatchWsEvent+0x6e
dtwm`main+0x1f1
dtwm`0x809772a
-> X_CreatePixmap: client 15
** Pixmap alloc: 01e001d0
-> X_CreatePixmap: client 15
** Pixmap alloc: 01e001d1
libX11.so.4`XCreatePixmap
dtwm`MakeIconPixmap+0x10c
dtwm`MakeNamedIconPixmap+0x14a
dtwm`ProcessWmHints+0x310
dtwm`GetClientInfo+0x299
dtwm`ManageWindow+0x18
dtwm`WmDispatchWsEvent+0x6e
dtwm`main+0x1f1
dtwm`0x809772a
-> X_CreatePixmap: client 14
** Pixmap alloc: 01c006f8
** Pixmap free: 01e001d0
** Pixmap free: 01e001d1
** Pixmap free: 02600009
** Client Disconnect: id 19
** Client Connect: id 19
So now that I could point the CDE team to the exact function to look at, MakeIconPixmap in dtwm, I passed the bug on to them. (Since Sun's CDE team is overseas, they were asleep while all this happened, and should be getting to work soon, so I don't have an ending yet to this story, but since this is a blog, I can just post that later.)
[Technorati Tag: Solaris] [Technorati Tag: OpenSolaris] [Technorati Tag: Xorg] [Technorati Tag: dtrace]Posted at 04:01PM Jun 23, 2005 by Alan Coopersmith in X11 | Comments[1]

Posted by Wayne Abbott on June 24, 2005 at 07:42 AM PDT #