tim at home work and in between.
« March 2006 »
MonTueWedThuFriSatSun
  
1
2
3
4
5
6
7
9
10
11
12
13
15
17
18
19
21
23
24
25
26
27
28
29
30
31
  
       
Today
XML

Blog::Navigation

Site notes

This page validates as XHTML 1.0, and will look much better in a browser that supports web standards, but it is accessible to any browser or Internet device. It was created using techniques detailed at glish.com/css/.

Technorati

Tim Uglow's personal blog

timu_at_home
Powered by Roller Weblogger.
« Previous day (Mar 19, 2006) | Main | Next day (Mar 20, 2006) »
20060320 Monday March 20, 2006
I was in her lane - obvious really.

This morning I got squeezed by a new grey Saab estate with child onboard approaching the QueensGate roundabout in Farnborough, she moved left into my lane and into my space even though I shouted a warning, she kept going until I had to brake to avoid a crash. Of course she had to stop at the traffic ahead so we had words.

She had to get into my space as she was in the wrong lane ....&**&^^%

The on the way home in Ewshot village by the excellent Windmill pub a 4x4 cut the corner across the junction and nearly hit me head on - thanks.

Oh yes and before I forget Hampshire County Council's response to my complaint about the roundabout outside the Sun camps was that it was difficult and they would think about it some more but in the mean time if I wore something bright drivers would notice me better - very patronising, very helpful. Prompted me to send my MP an email.

I used the find your mp page on the house of common's website but it errored most appropriately with "connection reset by peer" , most amusing.

lrand48() is an excellent function for generating testcases.

Spending a lot of time writing test cases to try and reproduce system panics has led me to use an interesting(ish) methodology. You stare at the data structures in the dump, you stare at the code and see if you can work out how to get things into a similar state. Then comes the difficult bit, I've taken to working out what operations are possible from the userland code and then randomising them using lrand48().

So this weeks exercise has been to reproduce a panic in the poll() code that from code inspection is impossible. The per process file table (indexed by file descriptor) is per process but the poll structures are per lwp, so there is a linked list of interested threads attached to a file entry if any threads are polling on that file entry. In the dump there are 3 threads chained off one file entry. So we know that we have a multi threaded process performing poll() on a single file descriptor from several threads at once. We panic'ed in close as we traversed that list as one of the threads has been reused by a process that does no polling so its per thread poll structures were null. So now we know that it has threads exiting and threads closing the file that we are polling on.

So I wrote a threaded program that opened a net connection and then went into a loop, it randomly started a new thread, those threads then possibly polled on that connection, or possibly exited, or possibly closed the connection. The main thread dealt with all of this, re-opening the connection if it got closed, starting new threads as ones exited - all under the choice of lrand48().

Did this reproduce it ? No, so then I randomised the number and contents of the pollfd array passed to poll() and suddenly the machine paniced with an identical stack trace to the customer's machine - the power of lrand48()

The good news is that it is fixed in solaris 10 already...

Copyright (C) 2003, tim at home work and in between.