The Robinson Factor
The Robinson Factor
David Robinson's Weblog

20050719 Tuesday July 19, 2005

NFSv4 ELSE Operation

A proposal for the IETF NFSv4 working group to extend  the COMPOUND VERIFY and NVERIFY operations to  allow for an ELSE clause.

Discussion on nfsv4@ietf.org

( Jul 19 2005, 11:55:50 AM CDT ) Permalink Comments [2]

20050306 Sunday March 06, 2005

Network installation made easy

If you have ever tried to install Solaris on an x86/x64 system over a network, you may have found that it is not trivial to get the system to initially boot. As an engineer who has written a number of the parts of the initial bootstrapping code, I have personally found the directions on docs.sun.com difficult to follow, especially for PXE booting and the necessary DHCP parameters.

Your savior is JET! A wonderful toolkit to easily setup jumpstart installation on all platforms.


Technorati Tag: Solaris
( Mar 06 2005, 12:06:10 AM CST ) Permalink Comments [3]

20050208 Tuesday February 08, 2005

The Robinson Factor

It was recently called to my attention that the kind authors of the O'Reilly book "Managing NFS and NIS, 2nd Edition" named a NFS server performance effect after me, The Robinson Factor.

Thanks to Hal Stern, Mike Eisler and Ricardo Labiaga!

The description of the problem is fairly vague in the book. The problem was first discovered using Solaris kernel trace data (vtrace: a predecessor of DTrace) on early multi-processors.

The straight forward way to implement an NFS server is to have a set of worker threads awaiting requests. When a request arrives, it is queued by the networking layer and a worker thread is awoke. The first observation is that when a worker thread completes a request it would just go to sleep to be immediately awoke or another thread is awoke to handle the request. The impact is the consumption of the overhead of a context switch including any cache migrations between cpus.

The obvious solution is to have the completing thread check the request queue for more work and process the next request instead of yielding. The unintended consequence of this optimization is that the networking layer has no mechanism to know if the request will be poached by a completing thread or if a sleeping thread needs to be awoken. Thus for each request, one thread is woke up. Under continuous load this is not a problem, but when there is any pause in the request flow some number of threads will awake to find no work. The result is a wasted context switch and possibly some cache thrashing.

To avoid the threads finding no work, the solution is to introduce the notion of state to the worker threads. In addition to the obvious states of awake and asleep, a third that we called drowsy was added. A drowsy thread is one that has been sent a cv_signal to cause it to wake up, but it has not actually run on a cpu. If the networking stack queues a request and there is a drowsy thread, it does not need to wake another thread because it is assured that either a running thread will complete and dequeue the request or it will swtch allowing the drowsy thread to dequeue the request.

While this does not complete eliminate threads waking to find no work, it does greatly decrease the frequency of it happening.

Technorati Tag: Solaris

( Feb 08 2005, 05:35:45 PM CST ) Permalink Comments [2]


Archives
Links
Referrers