Scalable Read-mostly Synchronization Using Passive Reader-Writer Locks
For my CS854 class I have to read a trio of research papers each week, and post summaries, which are then updated after the class that a student gives a presentation on it. Don't rely on my summaries for anything, but it might interest some of you, so I'm posting it here.
Just read the paper.
Scalable Read-mostly Synchronization Using Passive Reader-Writer Locks
This is a new kind of Reader-Writer locks.
Some background. rwlocks
are designed to work when you have many readers, who are allowed to read at the same time, or one writer.
These show up everywhere. Even in Rust's type system, where you are allowed one &mut
borrow, or many immutable borrows. However, in this case we want a runtime lock, not compile time.
Also, the traditional implementation of rwlocks
used in the Linux kernel have some problems, like as increasing latency for writers, or cause readers to contend, or not cope with a thread sleeping or being pre-empted in a critical section.
This paper introduces prwlock
, passive reader-writer locks. This relies on the machine architecture having Total Store Ordering.
Compared to competing locks, prwlock
has fast and low latency on the reader path, and bounded latency for the write path. It also attempts to be easier to use that RCU.
Design of prwlock
.
We have a 64-bit version variable ver
. Each writer increases the version and waits until all readers see the change.
This would work alone, but has a chance of starvation for writers, and any crashed reader (or even migrating between cores) could cause a deadlock.
We can use Inter Process Interrupts (IPIs) to request straggling readers to immediately report their status. Since IPSs are fairly cheap, this works well.
But it's possible that the reader has gone to sleep, and thus would miss the IPI. To avoid that problem, before any reader goes to sleep, it gets converted to an "Active Reader", and the lock maintains a count of how many "active readers" there are. Since sleeping isn't that common, the shared counter in the lock will not be a bottleneck.
Performance
Look at all the graphs! It's faster, good.
Strangely, it's slow to acquire a writer lock when there are no readers, but prwlock
is faster than their competitors when there are readers.