Thursday November 12, 2009 I've said many times that dtrace is not just a wonderful tool for developers and performance gurus. The Kings of Computing, which are of course System Admins, also find it really useful.
There is an ancient version of make called Parallel make that occasionally suffers from a bug (1223984) where it gets into a loop like this:
waitid(P_ALL, 0, 0x08047270, WEXITED|WTRAPPED) Err#10 ECHILD alarm(0) = 30 alarm(30) = 0 waitid(P_ALL, 0, 0x08047270, WEXITED|WTRAPPED) Err#10 ECHILD alarm(0) = 30 alarm(30) = 0 waitid(P_ALL, 0, 0x08047270, WEXITED|WTRAPPED) Err#10 ECHILD
This will then consume a CPU and the users CPU shares. The application is never going to be fixed so the normal advice is not to use it. However since it can be NFS mounted from anywhere I can't reliably delete all copies of it so occasionally we will see run away processes on our build server.
It turns out this is a snip to fix with dtrace. Simply look for cases where the wait system call returns an error and errno is set to ECHILD (10) and if that happens 10 times in a row for the same process and that process does not call fork then stop the process.
The script is simple enough for me to just do it on the command line:
# dtrace -wn 'syscall::waitsys:return / arg1 <= 0 &&
execname == "make.bin" && errno == 10 && waitcount[pid]++ > 20 / {
stop();
printf("uid %d pid %d", uid, pid) }
syscall::forksys:return / arg1 > 0 / { waitcount[pid] = 0 }'
dtrace: description 'syscall::waitsys:return ' matched 2 probes
dtrace: allowing destructive actions
CPU ID FUNCTION:NAME
2 20588 waitsys:return uid 36580 pid 29252
3 20588 waitsys:return uid 36580 pid 2522
5 20588 waitsys:return uid 36580 pid 28663
7 20588 waitsys:return uid 36580 pid 29884
10 20588 waitsys:return uid 36580 pid 941
15 20588 waitsys:return uid 36580 pid 1098
This was way easier then messing around with prstat, truss and pstop!
Except where otherwise noted, this site is
licensed under a Creative Commons License 2.0
This is a personal weblog, I do not speak for my employer.
This blog has moved to: http://chrisgerhard.wordpress.com/.
Comments here are closed comments there are open.