Open desktop mechanic

cat /dev/random | grep "For being ignorant to whom it goes I writ at random, very doubtfully"

It usually works

Wednesday Jun 09, 2004

One of the other Sun engineers titled his blog "sometimes it works", there are some bugs that almost never occur, but when they do occur they are just as vexing. Such intermittent problems can be extremely difficult to track down. Here are two examples of bugs that appeared only about once every 10-30 reboots on a customized Java Desktop System 2.0. The first bug presents the GNOME user with login failure and a "your session has lasted less than 10 seconds" message, about once every 10-30 reboots. Here are some details:

Xlib: connection to ":0.0" refused by server
Xlib: No protocol specified
 
/usr/bin/X11/xsetroot:  unable to open display ':0'

...  (None of the clients can connect to the xserver)

The logs in /var/lib/gdm show:
AUDIT: Fri Jun  4 15:48:27 2004: 1631 X: client 3 rejected from local host
AUDIT: Fri Jun  4 15:49:56 2004: 1631 X: client 2 rejected from local host
...
Further investigation showed the root cause of this. The problem occurs in the following circumstance:
  • The computer boots with a preconfigured hostname.
  • (xdm)generates an .Xauthority file with a magic cookie for this hostname.
  • The DHCP client asks for an IP address and a new hostname.
  • The DHCP client changes its hostname.
  • The user logs in, but now the hostname doesn't match the previously authorize one so none of the clients can connect to the xserver. NOTE: Xauthority is a mechanism for assuring that only authorized clients can access a display.

    A janatorial problem caused the second intermittant bug. The underlying linux distribution (SLED) does not clear the /tmp directory between reboots. Some processes create sockets in /tmp/.ICE-unix with the name of the process id. Unfortunately, after a reboot it is possible (in fact likely) to have a new process with the same pid as a long dead process. Because /tmp/.ICE-unix hasn't been cleaned out, the unlucky process can't create a socket with the name of its PID because the name already exists. Again, the symptom is an intermittent login failure.

    Like this post? del.icio.us | furl | slashdot | technorati | digg