In my naive little mind a rw lock would represents a
performant scalable construct inasmuch as WRITERS do not
hold the lock for a significant amount of time. One
figures that the lock would be held for short WRITERS
times followed by concurrent execution of RW_READERS.
What I recently found out is quite probably well known to
seasoned kernel engineer but this was new to me. So I
figured it could be of interest to others.
The SETUP
So Reader/Writer locks (RW) can be used in kernel and user
level code to allow multiple READERS of, for instance, a
data structure, to access the structure while allowing
only a single WRITER at a time within the bounds of the
rwlock().
A RW locks (
rwlock(9F),
rwlock(3THR))
is more complex that a simple mutex. So acquiring such locks will be
more expensive. This means that if the expected hold times of a lock
is quite small (say to update or read 1 or 2 fields of a structure)
then regular mutexes can usually do that job very well. A common
programming mistake is to expect faster execution of RW locks for
those cases.
However when READ hold times need to be fairly long; then
RW locks represent an alternative construct. With those
locks we expect to have multiple READERS executing
concurrently thus leading to performant code that scales
to large numbers of threads. As I said, if WRITERS are
just quick updates to the structure, we can naively believe
that our code will scale well.
Not So
Let's see how it goes. A WRITER lock cannot get in the
protected code while READERS are executing protected
code. The WRITER must then wait at the door until READERS
releases their hold. If the implementation of RW locks
didn't pay attention, there would be cases in which at
least one READER is always present within the protected
code and WRITERS would get starved of access. To prevent
such starvation, RW lock must block READERS as soon as a
WRITER has requested access. But no matter, our WRITERS
will quickly update the structure and we will get
concurrent execution most of the time. Isn't it ?
Well not quite. As just stated, a RW locks will block
readers as soon as a WRITER has hit the door. This means
that the construct does not allow parallel execution at
that point. Moreover the WRITER will stay at the door while
READERS are executing. So the construct stays fully
serializing from the time a WRITER hits until all current READERS
are done followed by the WRITERS time.
For Instance:
- a RW_READER gets in and will keep a long time. ---|
- a RW_WRITER hits the lock; is put on hold. |
- other RW_READERS now also block. |
.... time passes |
- the long RW_READER releases <----------------|
- the RW_WRITER gets the lock; work; releases
- other RW_READER now work concurrently.
Pretty obvious once you think about it. So to assess the
capacity of a RW lock to allow parallel execution, one
must consider the average hold time as a READER but also
the frequency of access as a WRITER. The construct becomes
efficient and scalable to N threads if and only if:
(avg interval between writers) >> (N * avg read hold times).
Roundup
In the end, from a performance point of view, RW locks
should be used only when the average hold times is
significant in order to justify the use of this more
complex type of lock: for instance, calling a function of
unknown latency or issuing an I/O while holding the lock
represent good candidates. But the construct will be
scalable to N threads, if and only if WRITERS are very
infrequent.
[T]:
NiagaraCMT
Solaris
Sun
In terms of read scalability - we've found the price of admission relatively high for rwlocks. The results we see show that something that takes as much as 1000 cycles for a "read" operation does not parallelize well if a rwlock is used - in this case, two readers result in less throughput than a single reader on most machines. As you've mentioned, the effects of frequent write are fairly catastrophic too - although ignoring scalability you could justify the use of rwlocks by their queuing semantics (writers preferred over readers, which is very much what we want, for example).
It's also worth mentioning that preemption control (i.e. the schedctl APIs) can help avoid an effective priority inversion when using relatively short userland locks like this.
Going back to the original point, you said "RW locks should be used only when the average hold times is significant in order to justify the use of this more complex type of lock". It's very subjective, but what do you consider to be significant in this case?
Posted by Philip Beevers on décembre 15, 2005 at 08:21 PM MET #
rwlocks are mostly horrid. Their very nature encourages developers to hold locks for extended periods, contrary to received wisdom. The more readers there are, and the longer they run holding the lock, the greater the chance that one (or more) will be preempted before releasing the lock. The net effect is that readers end up using stale data, while fresh data are waiting.
If reads and updates are short, a mutex should suffice. Mutexes imply less in terms of fairness, and are therefore generally much cheaper to implement.
If read processing is long but updates are rare, it is generally better for readers to take a constistent read up front, do their processing without lock protection, and then to restart if anything has changed.
I'm sure there are cases where rwlocks make sense, but I generally work hard to avoid them. Their appearance in Solaris owes more to standards compliance than hearty approval.
Posted by Phil Harman on décembre 15, 2005 at 10:52 PM MET #
Posted by Luke on décembre 15, 2005 at 11:59 PM MET #
Posted by Philip Beevers on décembre 16, 2005 at 01:13 AM MET #
Posted by Phil Harman on décembre 16, 2005 at 04:07 PM MET #
Posted by Roch Bourbonnais on décembre 16, 2005 at 05:08 PM MET #