For some days I have been tracking down a problem which only shows itself on specific platforms and only when run in big automated settings when we try to run Slony-I tests. Off course, when we try to reproduce from our own environment, everything just works.
It turns out that due to some environment issue (PATH settings) different ps(1) commands were run on Solaris x86 and Solaris SPARC for the same user account, and that the Slony-I test suite parses ps(1) output (see _check_pid() in slony1-engine/tests/support_funcs.sh) to check if a process is still running.
I wrote a patch that used pgrep(1) to do the same in a slightly more robust way, but then Ståle - a manager! - came up to me and told me about kill -0 <pid>. How embarrassing not knowing about kill -0, and even more embarrassing being told by a manager. Anyway, now you are warned about kill -0, but knowledgeable managers I can not protect you from 


Jorgen,
Even after browsing the link and reading the man pages for Kill I'm not able to figure out what kill -0 will do. So your candidness will be much appreciated.
I'm not guilty or ashamed about it. He Ha. A few years back I very well would have been. Well, one grows up!
-Balaji S.
Posted by Balaji Sowmyanarayanan on May 27, 2008 at 12:40 PM CEST #
Yeah, I guess I should explain it better.
http://www.opengroup.org/onlinepubs/009695399/functions/kill.html says:
If sig is 0 (the null signal), error checking is performed but no signal is actually
sent. The null signal can be used to check the validity of pid.
Or to give an example:
[jaustvik@host:/] ps
PID TTY TIME CMD
15061 pts/28 0:00 ps
14549 pts/28 0:00 bash
[jaustvik@host:/] kill -0 14549
[jaustvik@host:/] echo $?
0
[jaustvik@host:/] kill -0 1454988
bash: kill: (1454988) - No such process
[jaustvik@host:/] echo $?
1
Posted by Jørgen Austvik on May 27, 2008 at 01:52 PM CEST #
Great post, thanks T
Posted by tom on June 11, 2008 at 03:44 AM CEST #