Trantorian Gazette

Tricky problem with NTP

Wednesday Jul 27, 2005

I am what is known as the Lead Product Engineer (LPE) for NTP in Solaris. As such, I am often involved in the NTP project development.

Recently, Dr. Mills (inventor of NTP) was having a problem with the latest development version of NTP running on one particular Solaris system. After an indeterminate period of time, the NTP daemon would just exit, without any information to the log or debug output.

Dr. Mills was perplexed, so I offered to take a look using dtrace. Actually, I first used truss. The truss showed that the daemon was receiving a SIGINT signal and exiting, as it is supposed to do when it got that signal. I then used the dtrace proc provider (this was my first use of the proc provider, by the way) to determine the sender of the signal, like so:


#!/usr/sbin/dtrace -s
proc:::signal-send
/args[2] == SIGINT && args[1]->pr_fname == "ntpd"/
{
        printf("SIGINT signal sent to %s by uid=%d pid=%d",
                args[1]->pr_fname,uid,pid);
}
The results of this showed that the uid and pid were both 0. So, the signal was coming from root, but what does pid=0 mean? What it means is that there is no process sending the signal, it is coming from the kernel. So, I re-ran the test, this time adding a "stack();" line, so I could see the kernel stack at the time the signal is generated. What I found was that the signal was being requested by the ldterm kernel module, which was in turn being called during the processing of data from the su serial driver.

So, looking at the ntp.conf file, I see that this system has two refclocks configured, one for a GPS and one for a PPS. Now, the GPS reads data on the serial line, but the PPS only looks for state transitions. So, maybe once in a while, the GPS sends something that the ldterm modules interprets as the INT character. Looking at the code for ldterm, I see that SIGINT is sent when the ldterm modules receives the M_BREAK flag from the serial driver, but only if the IGNBRK option is not set. So, we need to set IGNBRK for refclocks. But now I know that it isn't the INT character, it is a break. You get a break on a serial line when the other device loses power, or the cable is disconnected. Maybe the GPS power cycles on occasion?

So, I look at the NTP code, and find that we already set IGNBRK when the refclocks are opened. So, how can the SIGINT be sent? I fire up mdb and look at the flags set for each of the serial lines in the currently running ntpd. Well, the GPS has IGNBRK set, but the PPS does not. I look at where in the serial driver the ldterm modules is being called (actually, it isn't called from there, the putnext routine is called, but it amounts the same thing in streams) and it turns out that the M-BREAK is also sent when the data read by the driver has a parity error.

So there you have it. Even though the PPS does not read data off the of serial line, random electric noise sometimes presents data on the line. Since it is random, it has a good chance that it will have a parity error, causing a break indication, resulting in a SIGINT signal. The daemon did not bother setting IGNBRK on the PPS serial line, because it was not going to read data anyway. The solution is to set IGNBRK on all serial lines, not just the ones opened to be read.

Afterwards, Dr. Mills told me that he had reports of the same problem on Linux systems, but it had never happened to him, so was unable to debug it. So, Linux reaps the benefits if dtrace indirectly.


Technorati Tags: , ,

[2] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

Do you know how C pointers work?

Wednesday Jul 20, 2005

Here is a bit of a puzzler. I have a short C program that may not work the way you think it does.

#include 
#include 

int 
main(int argc, char **argv) {
	int *a=malloc(sizeof(int));
	int *b=malloc(sizeof(int));
	int *c=malloc(sizeof(int));
	*a=6;
	*b=2;
	*c=*a/*b;
	/* can you predict what will print out? */;
	printf("*a=%d, *b=%d, *c=%d\n",*a,*b,*c);
}

Try compiling this and see what it does. Did it do what you thought? Do you understand why?

Like this post? del.icio.us | furl | slashdot | technorati | digg

So, that's why they invented the wheel.

Saturday Jul 16, 2005

Recently, my youngest son, Joshua, asked us when the wheel was invented. My wife and I attempted to answer, trying to remember and figure it out.

Well, oddly enough, the answer is in this month's Scientific American Magazine, in a book review of The Looting of the Iraq Museum, Baghdad : The Lost Legacy of Ancient Mesopotamia. It turns out that we both guessed wrongly by thousands of years. A little more research on the web showed that the wheel was invented about 5500 years ago, shortly after writing.

Oddly enough, the wheel came shortly after plywood and roads, and somewhat after beer. So, we can draw the obvious conclusion that all of these inventions were made for beer delivery. Writing was clearly designed to help keep track of your beer tab.

[2] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

Itinerant Cranberry Pickers...

Sunday Jul 10, 2005

Did you know that itinerant cranberry pickers Wade Boggs and then Mo Vaughn?

Heard that last night on Says You.

Like this post? del.icio.us | furl | slashdot | technorati | digg