HSFS Changes in recent BeleniX 0.6.1
I put in a few changes to the HSFS module in the recent BeleniX 0.6.1 release. Firstly sync my HSFS enhancements to the current HSFS source in OpenSolaris. Secondly fix a panic in the HSFS module and thirdly modify the I/O scheduling logic a wee bit to get better CLOOK behavior and improved performance.
The fix for the panic had been the hardest part and the panic used to manifest occasionally in earlier releases with an incomplete stacktrace. In 0.6 however it started happening every time on many machines if 0.6 was being booted off a CDROM media. It turned out to be mistake in the readahead code which was unmapping a page before it was to be used. Well it was quite a bit of a corner case: the panic would happen servicing a sector being readahead when there are two adjacent non-interleaved sectors and when it's previous sector is on a page boundary and it was waiting too long in the I/O queue and was serviced when it's deadline timer expired. It's something that happened with a previous read in the past causing a failure in the future.
Some corner case! Quite a bit of struggle to figure that out. Thanks to Pramod and Pavan a couple of kernel folks for their help. Because of the way the VM pages were being handled for the readahead case a sector read on a page start boundary will cause the page to be mapped out (via ppmapout) so the subsequent sector read will have nowhere to write the data to. Normally this is not a problem because the I/O scheduler coalesces individual reads, but a deadline expiry changes that.
Looking at one's own code after a period of time can help to uncover logic flaws. My HSFS changes implementing the CLOOK scheduling logic lacked a little bit of effectiveness. In CLOOK that disk head moves in one direction servicing all the requests in the way till there are no more in that direction. It then revs back to the beginning and starts again. The way I had implemented this, the I/O queue structure had a next pointer to the next higher logic block address request or NULL if there aren't any. This pointer would be set in the scheduling logic when it is looking for requests to service. It will set the next pointer to the pending I/O request in the queue after the last request being serviced in the current iteration of the scheduling function. The next iteration would pick up this pointer (if non-NULL) and start from there.
However there is a possibility that more I/O requests might get enqueued in between the last serviced and the next to-be-serviced one. These would get skipped to be serviced later. Not quite the behavior we want. So I changed this to record the Logical Block Address of the last serviced request instead of a pointer. Then the scheduling function will probe the queue for an I/O request with a higher Logical Block Address. By this approach we would also catch those I/O requests we re missing earlier. This change appears to have helped and initial tests have indicated an additional performance gain of 10%-12%.
Posted by Gurpreet Singh on July 18, 2007 at 10:11 AM PDT #