In my naive little mind a rw lock would represents a
performant scalable construct inasmuch as WRITERS do not
hold the lock for a significant amount of time. One
figures that the lock would be held for short WRITERS
times followed by concurrent execution of RW_READERS.
What I recently found out is quite probably well known to
seasoned kernel engineer but this was new to me. So I
figured it could be of interest to others.
The SETUP
So Reader/Writer locks (RW) can be used in kernel and user
level code to allow multiple READERS of, for instance, a
data structure, to access the structure while allowing
only a single WRITER at a time within the bounds of the
rwlock().
A RW locks (
rwlock(9F),
rwlock(3THR))
is more complex that a simple mutex. So acquiring such locks will be
more expensive. This means that if the expected hold times of a lock
is quite small (say to update or read 1 or 2 fields of a structure)
then regular mutexes can usually do that job very well. A common
programming mistake is to expect faster execution of RW locks for
those cases.
However when READ hold times need to be fairly long; then
RW locks represent an alternative construct. With those
locks we expect to have multiple READERS executing
concurrently thus leading to performant code that scales
to large numbers of threads. As I said, if WRITERS are
just quick updates to the structure, we can naively believe
that our code will scale well.
Not So
Let's see how it goes. A WRITER lock cannot get in the
protected code while READERS are executing protected
code. The WRITER must then wait at the door until READERS
releases their hold. If the implementation of RW locks
didn't pay attention, there would be cases in which at
least one READER is always present within the protected
code and WRITERS would get starved of access. To prevent
such starvation, RW lock must block READERS as soon as a
WRITER has requested access. But no matter, our WRITERS
will quickly update the structure and we will get
concurrent execution most of the time. Isn't it ?
Well not quite. As just stated, a RW locks will block
readers as soon as a WRITER has hit the door. This means
that the construct does not allow parallel execution at
that point. Moreover the WRITER will stay at the door while
READERS are executing. So the construct stays fully
serializing from the time a WRITER hits until all current READERS
are done followed by the WRITERS time.
For Instance:
- a RW_READER gets in and will keep a long time. ---|
- a RW_WRITER hits the lock; is put on hold. |
- other RW_READERS now also block. |
.... time passes |
- the long RW_READER releases <----------------|
- the RW_WRITER gets the lock; work; releases
- other RW_READER now work concurrently.
Pretty obvious once you think about it. So to assess the
capacity of a RW lock to allow parallel execution, one
must consider the average hold time as a READER but also
the frequency of access as a WRITER. The construct becomes
efficient and scalable to N threads if and only if:
(avg interval between writers) >> (N * avg read hold times).
Roundup
In the end, from a performance point of view, RW locks
should be used only when the average hold times is
significant in order to justify the use of this more
complex type of lock: for instance, calling a function of
unknown latency or issuing an I/O while holding the lock
represent good candidates. But the construct will be
scalable to N threads, if and only if WRITERS are very
infrequent.
[T]:
NiagaraCMT
Solaris
Sun