Thursday Jun 05, 2008
As you probably know, Sun is now offering support for the OpenSolaris distribution. This means that critical bug fixes will be made in versions of packages that can be installed on prior releases of OpenSolaris. That mechanism is coming online right now, and I was the first person to integrate a fix into the newly created source tree for the 2008.05 release of OpenSolaris. I can't tell you just yet what the fix is, because it is a critical security related fix, and we have to co-ordinate the release of the info about the bug and the binary fix itself for all supported releases in order to avoid the "zero-day" effect.
Let me tell you, making co-ordinated fixes like this is a lot of work. I had to create the fix for this bug for six (6!) different releases simultaneously. We have to have all the patches, packages and the SunAlert all ready to go at the same time. Phew!
Wednesday Apr 25, 2007
There is finally light at the end of the tunnel for the NTP project at OpenSolaris.org. We are going to be releasing test bits soon, and we will need testers when that happens. In addition, we need people to help work on the man pages. The man pages from the NTP open source project are a good starting point, but they need some work to make them more Solaris specific. We also need SMF experts to help design and implement the SMF manifest and methods. And last but not least, we need people to help work on the PSARC documents needed to file the case.
Since this is a open project, we are going to leverage Google docs to help with the collaboration. If you want to help, just send me your email address and I will add you to the document access on Google docs. My email is (explanded to avoid spam) brian dot utterback at sun dot com. And please join the discussion at ntp dash discuss at opensolaris dot org. The project page is at http://opensolaris.org/os/project/ntp/. See you there!
Tuesday Apr 24, 2007
I was recently involved with a maddening bug (CR 6540867). The problem was that the gettimeofday system call would on occasion show a time 10 milliseconds in the past. This only affected the amd64 architecture and while running multiple CPUs.
This turned out to be an exercise in frustration for me, but also a major learning experience.
I had a test case that would run gettimeofday calls in a tight loop, and check if one ever returned a value prior to one already returned. I used truss to see what system call was being used. Surprise! The gettimeofday call doesn't use any system call, the truss just sat there.Okay, remembering the rule of "use the source, Luke", I used the
OpenGrok source browser at
OpenSolaris.org to start to get a handle on just how gettimeofday worked. Drilling down in the the code in
libc, I found that on amd64, it is an assembly code line that looks like this:
SYSFASTTRAP(GETHRESTIME)
So where is the corresponding implementation in the kernel for the trap GETHRESTIME? I plugged that into OpenGrok and got another surprise; putting it into the "definition" search got no results, the "symbol" search got the libc implementations, and putting it into "full" search got 110 hits, but none of them was the one I wanted. For full searchs opengrok is case insensitive, so I was getting all the hits for the routine "gethrestime". So where is it?Turns out that the kernel uses the symbol "
T_GETHRESTIME" instead. I found that symbol in the intel architecture subdirectory of the kernel. To make a long story short, what actually gets called goes through a convoluted maze of function pointers and such, but the gist of it is that the trap calls through the
trap table to
get_hrestime which calls
gethrestime which calls the function pointed to by
gethrestimef which happens to be
pc_gethrestime.>
Once in pc_gethrestime, it reads the timestamp set at the last tick, adds the time since the last tick to that and returns it. There is a possible race here, since the timestamp of the last tick is set in the hres_clock routine which could run at the same time on another CPU. If you read the timestamp of the last tick, and then the number of cycles since the last tick, but hres_tick ran in between, you would see the time go backward one tick, i.e. 10 milliseconds. Hmm...sound familiar?
The thing is that there is a lock around the timestamp and the routine that gets the cycles since the last tick so that they are guaranteed to be consistent. The code looks like this:
void pc_gethrestime(timestruc_t *tp)
{
int lock_prev;
timestruc_t now;
int nslt; /* nsec since last tick */
loop:
lock_prev = hres_lock;
now = hrestime;
nslt = (int)(gethrtime() - hres_last_tick);
lots of stuff that normalizes the timestamp (elided)
if ((hres_lock & ~1) != lock_prev)
goto loop;
Now it took a lot of thinking to see how this lock works. The idea is that the lowest byte of
hres_lock is 0 when it is unlocked and -1 (8 one bits) when it is locked. When hres_lock grabs the lock,
it uses the atomic amd64 instruction xchgb, which exchanges the value in the byte with -1. If
the value returned is 0, then it has the lock. If the value was already -1, then it was already locked and hres_lock needs to go back and try again. When it wants to unlock the lock, it does an atomic increment to it. This changes the 8 one bits to all zero and adds one to the value in the upper three bytes. So when pc_gethrestime stores the value of the lock before it starts and then compares it to the value afterward except the last bit, if the lock was locked before it started, or is locked now or was locked at anytime in between, then the test fails and it
goes back and does it all again.
So, try as I might, I just could not come up with a scenario where we could get the inconsistent values. I was banging my head against the wall on this. Finally, I had to go away for a week, and I passed this to my colleague Sudheer Abdul-Salam to work on, and he finally came up with the answer that was eluding me.
Before I tell you the answer, I want to explain something. I have a background writing software, and really take the source code to heart. To me, the source code is the ultimate authority. On the other hand Sudheer is a kernel guy. To a kernel guy, the core dump is the ultimate authority. So Sudheer didn't just look at the source, he looked at the core. And what he found was that for the lines:
lock_prev = hres_lock;
now = hrestime;
the compiler was actually producing
pc_gethrestime+0x10: movq 0x3eaee1(%rip),%rax <--hrestime>
pc_gethrestime+0x17: movq 0x3eaee2(%rip),%rdx <--hrestime+8>
pc_gethrestime+0x1e: movl 0x3eaefc(%rip),%ebx <--hres_lock>
That is, the code was reading the hretime timestamp outside
of the block locked by hres_lock. When the tick was updated at just that
point, the time jumped backward. Well, if you can't trust the compiler,
who can you trust?
I am told that the problem is that hres_lock
and hrestime should be declared "volatile" and then the compiler would
not have reordered the loads. Doh!
Hat's off to Sudheer for figuring this one
out and preserving my sanity. Remember that you can go right to the source, but it is the
core that is core. I just hope I don't get an issue that involves taking the cash (cache) to the (memory) bank. The only thing more core than core is what's in the cache.
Monday Nov 13, 2006
Sun has been saying for a long time that Sun has been the largest single contributor to open source. But the term open source means different things to different people and we have not always been believed. With today's announcement that Java is now open source with the GPLv2 license, there is no denying it any longer. Even Richard Stallman admits that Sun is with out a doubt the largest contributor to open source. OpenOffice, Netbeans, GlassFish and now all of Java; all of them huge, all of them "really useful."
Technorati tags: opensource,java,Sun
"This means that critical bug fixes will be m...
The plan is that for customers that desire a ...
Thanks for the answer.
With Sun Cluster now bein...