David Goldblatt's websitehttp://dgoldblatt.com/2015-06-30T15:20:00-07:00A threading riddle - Solution2015-06-30T15:20:00-07:002015-06-30T15:20:00-07:00David Goldblatttag:dgoldblatt.com,2015-06-30:/a-threading-riddle-solution.html<p>A solution to the threading riddle from the previous post</p><h1>Introduction</h1>
<p>In <a href="http://dgoldblatt.com/a-threading-riddle.html">the last post</a>, we looked at a threading problem
that requires a some trickiness to solve. In this post, we'll reveal the tricks
that we need.</p>
<p>As a recap, we had two functions we set out to implement, that operate on an
array of size <code>N</code>.</p>
<ul>
<li>
<p><code>void modify(size_t index, int value);</code> This changes the element at position
<code>index</code> to equal <code>value</code>.</p>
</li>
<li>
<p><code>void wait_until_equal(size_t index1, size_t index2);</code> This blocks until the
elements at positions <code>index1</code> and <code>index2</code> of the array are equal.</p>
</li>
</ul>
<p>We wanted to see how to implement these functions.</p>
<p>As always, I've thought about this code, but haven't tested it. Given the tricky
nature of concurrency problems, there's almost certainly bugs, errors, and
performance gotchas. Think and test carefully before using.</p>
<h1>Initial data structures</h1>
<p>Let's start out by looking at the variables required by the problem.</p>
<div class="highlight"><pre><span></span><span class="k">const</span> <span class="kt">int</span> <span class="n">num_elements</span> <span class="o">=</span> <span class="p">...;</span>
<span class="kt">int</span> <span class="n">array</span><span class="p">[</span><span class="n">num_elements</span><span class="p">];</span>
<span class="kt">void</span> <span class="nf">modify</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">index</span><span class="p">,</span> <span class="kt">int</span> <span class="n">value</span><span class="p">);</span>
<span class="kt">void</span> <span class="nf">wait_until_equal</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">index1</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">index2</span><span class="p">);</span>
</pre></div>
<p>We want operations on different array indices to acquire different locks, in
order to allow for as much parallelism as is possible. To accomplish this, let's
have a per-element lock; <code>mu[i]</code> guards modifications to <code>array[i]</code>.</p>
<div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span> <span class="n">mu</span><span class="p">[</span><span class="n">num_elements</span><span class="p">];</span>
</pre></div>
<h1>Inspiration</h1>
<p>With our fundamental data definitions out of the way, we can talk about how to
solve the problem. To do this, let's look at how we would solve a simpler
problem. Suppose instead of <code>void wait_until_equal(size_t index1, size_t
index2)</code>, we had <code>void wait_until_zero(size_t index)</code>, which waits until
<code>arr[index]</code> is <code>0</code>. Then the problem is easy: we have a condition variable per
array element, which gets notified on changes to the element. Then we could
implement the functionality as follows:</p>
<div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">condition_variable</span> <span class="n">cv</span><span class="p">[</span><span class="n">num_elements</span><span class="p">];</span>
<span class="kt">void</span> <span class="nf">modify</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">index</span><span class="p">,</span> <span class="kt">int</span> <span class="n">value</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">></span> <span class="n">lock</span><span class="p">(</span><span class="n">mu</span><span class="p">[</span><span class="n">index</span><span class="p">]);</span>
<span class="n">array</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span><span class="p">;</span>
<span class="n">cv</span><span class="p">[</span><span class="n">index</span><span class="p">].</span><span class="n">notify_all</span><span class="p">();</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">wait_until_zero</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">index</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">unique_lock</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">></span> <span class="n">lock</span><span class="p">(</span><span class="n">mu_index</span><span class="p">);</span>
<span class="k">while</span> <span class="p">(</span><span class="n">array</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">cv</span><span class="p">.</span><span class="n">wait</span><span class="p">(</span><span class="n">lock</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h1>An abstraction -- MultiConditionVariable</h1>
<p>Inspired by <code>wait_until_zero</code>, we might see a solution: if we had a <code>wait</code> that
could wait on multiple conditions simultaneously, then, in <code>wait_until_equal</code>,
we could wait for a notification on the condition associated with either of the
indices we're interested in. We want a <code>MultiConditionVariable</code> that allows
waiting on multiple conditions at once.</p>
<p>Let's write out our implementation in terms of this primitive before we see how
to implement the primitive.</p>
<div class="highlight"><pre><span></span><span class="n">MultiConditionVariable</span> <span class="n">mcv</span><span class="p">[</span><span class="n">num_elements</span><span class="p">];</span>
<span class="kt">void</span> <span class="nf">modify</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">index</span><span class="p">,</span> <span class="kt">int</span> <span class="n">value</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">></span> <span class="n">lock</span><span class="p">(</span><span class="n">mu</span><span class="p">[</span><span class="n">index</span><span class="p">]);</span>
<span class="n">array</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span><span class="p">;</span>
<span class="n">mcv</span><span class="p">[</span><span class="n">index</span><span class="p">].</span><span class="n">notify_all</span><span class="p">();</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">wait_until_equal</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">index1</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">index2</span><span class="p">)</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">lock</span><span class="p">(</span><span class="n">mu</span><span class="p">[</span><span class="n">index1</span><span class="p">],</span> <span class="n">mu</span><span class="p">[</span><span class="n">index2</span><span class="p">]);</span>
<span class="k">while</span> <span class="p">(</span><span class="n">array</span><span class="p">[</span><span class="n">index1</span><span class="p">]</span> <span class="o">!=</span> <span class="n">array</span><span class="p">[</span><span class="n">index2</span><span class="p">])</span> <span class="p">{</span>
<span class="n">MultiConditionVariable</span><span class="o">::</span><span class="n">wait</span><span class="p">(</span><span class="n">mcv</span><span class="p">[</span><span class="n">index1</span><span class="p">],</span> <span class="n">mu</span><span class="p">[</span><span class="n">index1</span><span class="p">],</span> <span class="n">mcv</span><span class="p">[</span><span class="n">index2</span><span class="p">],</span> <span class="n">mu</span><span class="p">[</span><span class="n">index2</span><span class="p">]);</span>
<span class="p">}</span>
<span class="n">mu</span><span class="p">[</span><span class="n">index1</span><span class="p">].</span><span class="n">unlock</span><span class="p">();</span>
<span class="n">mu</span><span class="p">[</span><span class="n">index2</span><span class="p">].</span><span class="n">unlock</span><span class="p">();</span>
<span class="p">}</span>
</pre></div>
<h1>Implementing MultiConditionVariable</h1>
<p>Now, let's see how to implement the <code>MultiConditionVariable</code> class. First, we'll
describe its semantics:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">MultiConditionVariable</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="c1">// Causes the current thread to wait until one of mcv1 or mcv2 is</span>
<span class="c1">// notified, atomically releasing mu1 and mu2, or a spurious wakeup</span>
<span class="c1">// occurs. Will reacquire mu1 and mu2 before returning.</span>
<span class="k">static</span> <span class="kt">void</span> <span class="n">wait</span><span class="p">(</span><span class="n">MultiConditionVariable</span><span class="o">&</span> <span class="n">mcv1</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">&</span> <span class="n">mu1</span><span class="p">,</span>
<span class="n">MultiConditionVariable</span><span class="o">&</span> <span class="n">mcv2</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">&</span> <span class="n">mu2</span><span class="p">);</span>
<span class="c1">// Wakes up all threads which are currently waiting on *this. Note that</span>
<span class="c1">// such threads will also be waiting on some other MultiConditionVariable</span>
<span class="c1">// as well.</span>
<span class="kt">void</span> <span class="nf">notify_all</span><span class="p">();</span>
<span class="p">};</span>
</pre></div>
<p>How should we implement this class? Here's what we know:</p>
<ul>
<li>
<p>Threads need to go to sleep until woken in <code>wait</code>; we need a way to enable
sleeping/waking.</p>
</li>
<li>
<p><code>notify_all</code> needs to know which threads to wake up; there must be a way of
keeping track of the set of threads.</p>
</li>
<li>
<p>Because of the previous point, <code>wait</code> needs to inform the
<code>MultiConditionVariable</code> that it is sleeping on it, so that it can be later
notified.</p>
</li>
</ul>
<h2>A Sleeper</h2>
<p>The waiting going on is necessarily a little bit tricky; when a thread goes to
sleep, it can't hold any locks other threads might need to acquire, or else we
might cause deadlock. But if we don't hold any locks, we might miss the fact
that the thread should wake up. To fix this, we'll have to use a mutex and a
condition variable. Let's encapsulate this into a struct:</p>
<div class="highlight"><pre><span></span><span class="k">struct</span> <span class="n">Sleeper</span> <span class="p">{</span>
<span class="kt">void</span> <span class="n">sleep</span><span class="p">()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">unique_lock</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">></span> <span class="n">lock</span><span class="p">(</span><span class="n">mu</span><span class="p">);</span>
<span class="n">cv</span><span class="p">.</span><span class="n">wait</span><span class="p">(</span><span class="n">lock</span><span class="p">,</span> <span class="p">[</span><span class="o">&</span><span class="p">]{</span><span class="k">return</span> <span class="n">awoken</span><span class="p">;})</span>
<span class="n">awoken</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">wake</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">bool</span> <span class="n">old_awoken</span><span class="p">;</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">></span> <span class="n">lock</span><span class="p">(</span><span class="n">mu</span><span class="p">);</span>
<span class="n">old_awoken</span> <span class="o">=</span> <span class="n">awoken</span><span class="p">;</span>
<span class="n">awoken</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">old_awoken</span><span class="p">)</span> <span class="p">{</span>
<span class="n">cv</span><span class="p">.</span><span class="n">notify_one</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">bool</span> <span class="n">awoken</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">condition_variable</span> <span class="n">cv</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">mutex</span> <span class="n">mu</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
<p>The way the sleeper works, it can put itself into a list of threads waiting to
be woken up, release the lock on the list, and then <code>sleep()</code>. If another thread
acquires the lock on the list, calls <code>wake()</code> to wake up the first thread, and
then continues, then it doesn't matter if the <code>sleep()</code> or the <code>wake()</code> happens
first; the thread that <code>sleep()</code>s will continue either way.</p>
<h2>MultiConditionVariable internals</h2>
<p>The <code>MultiConditionVariable</code> needs to maintain the set of threads that are
waiting on it. To do this, we'll use an <code>std::unordered_set</code> of
pointers-to-<code>Sleeper</code>. Since this set will be accessed concurrently, we'll add
an <code>std::mutex</code> as well to protect it. The call to <code>wait</code> will create a
<code>Sleeper</code> and add it to the passed-in <code>MultiConditionVariable</code>s before going to
sleep. Upon awakening, it removes the <code>Sleeper</code> from the
<code>MultiConditionVariable</code>s and returns. A call to <code>notify_all</code> simply wakes all
the threads that are contained in the <code>MultiConditionVariable</code>. Note that races
in which a thread is added to the <code>MultiConditionVariable</code> when the
corresponding element of <code>mu</code> isn't held is fine; in the worst case, we get a
spurious wakeup of the added thread, which is allowed by the
<code>MultiConditionVariable</code> contract.</p>
<p>So, here's <code>MultiConditionVariable</code> with an implementation:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">MultiConditionVariable</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="k">static</span> <span class="kt">void</span> <span class="n">wait</span><span class="p">(</span><span class="n">MultiConditionVariable</span><span class="o">&</span> <span class="n">mcv1</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">&</span> <span class="n">mu1</span><span class="p">,</span>
<span class="n">MultiConditionVariable</span><span class="o">&</span> <span class="n">mcv2</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">&</span> <span class="n">mu2</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Sleeper</span> <span class="n">sleeper</span><span class="p">;</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">></span> <span class="n">lock</span><span class="p">(</span><span class="n">mcv1</span><span class="p">.</span><span class="n">mu</span><span class="p">);</span>
<span class="n">mcv1</span><span class="p">.</span><span class="n">sleepers_</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="o">&</span><span class="n">sleeper</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">></span> <span class="n">lock</span><span class="p">(</span><span class="n">mcv1</span><span class="p">.</span><span class="n">mu</span><span class="p">);</span>
<span class="n">mcv2</span><span class="p">.</span><span class="n">sleepers_</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="o">&</span><span class="n">sleeper</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">mu1</span><span class="p">.</span><span class="n">unlock</span><span class="p">();</span>
<span class="n">mu2</span><span class="p">.</span><span class="n">unlock</span><span class="p">();</span>
<span class="n">sleeper</span><span class="p">.</span><span class="n">sleep</span><span class="p">();</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">></span> <span class="n">lock</span><span class="p">(</span><span class="n">mcv1</span><span class="p">.</span><span class="n">mu_</span><span class="p">);</span>
<span class="n">mcv1</span><span class="p">.</span><span class="n">sleepers_</span><span class="p">.</span><span class="n">erase</span><span class="p">(</span><span class="o">&</span><span class="n">sleeper</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">></span> <span class="n">lock</span><span class="p">(</span><span class="n">mcv1</span><span class="p">.</span><span class="n">mu_</span><span class="p">);</span>
<span class="n">mcv2</span><span class="p">.</span><span class="n">sleepers_</span><span class="p">.</span><span class="n">erase</span><span class="p">(</span><span class="o">&</span><span class="n">sleeper</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">std</span><span class="o">::</span><span class="n">lock</span><span class="p">(</span><span class="n">mu1</span><span class="p">,</span> <span class="n">mu2</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">notify_all</span><span class="p">()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">></span> <span class="n">lock</span><span class="p">(</span><span class="n">mu_</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="n">Sleeper</span><span class="o">*</span> <span class="nl">sleeper</span> <span class="p">:</span> <span class="n">sleepers_</span><span class="p">)</span> <span class="p">{</span>
<span class="n">sleeper</span><span class="o">-></span><span class="n">wake</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">sleepers_</span><span class="p">.</span><span class="n">clear</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">private</span><span class="o">:</span>
<span class="n">std</span><span class="o">::</span><span class="n">unordered_set</span><span class="o"><</span><span class="n">Sleeper</span><span class="o">*></span> <span class="n">sleepers_</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">mutex</span> <span class="n">mu_</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
<p>This gives us the <code>MultiConditionVariable</code> primitive we need, which completes
the puzzle.</p>
<h2>Some very slight fixes</h2>
<p>A particularly eagle-eyed reader might note that we haven't completely satisfied
the requirements from the previous post. In particular, a call to <code>modify</code> might
need to wake a sleeping thread that is being woken by a call to <code>modify</code> from
another thread. This involves acquiring a lock on the <code>Sleeper</code> struct. So, a
<code>modify</code> call might wait on another call to <code>modify</code> if they each try to wake
the same thread, breaking the rule that calls to one function shouldn't wait on
calls to another unless they share an index they operate on.</p>
<p>We have three avenues to fix this.</p>
<ul>
<li>
<p>Most reasonably, we could simply declare that waiting for a lock in Sleeper
shouldn't count as blocking. Waiting for the time it takes to modify a boolean
and signal a condition variable is hardly waiting at all. This works unless
you need waking to truly avoid waiting for another thread (say, if you're
writing signal-safe code), even in corner-cases.</p>
</li>
<li>
<p>We could implement <code>Sleeper</code> without the mutex/condition variable primitive,
relying on OS system call functionality like Linux futexes, or the Windows
event primitive.</p>
</li>
<li>
<p>We could use atomics, and <code>compare_exchange_strong</code> a boolean in order to
allow one thread to "win" a race and become the designated waker of the
sleeping thread. Since all but one of multiple racing calls to <code>wake</code> can
safely be ignored (the sleeping thread needs only to be woken once), we can
avoid having to acquire a mutex and potentially block.</p>
</li>
</ul>
<h1>Conclusion</h1>
<p>Just to summarize the steps here: we first implemented <code>modify</code> and
<code>wait_until_equal</code> by reducing them to a primitive <code>MultiConditionVariable</code>.
<code>MultiConditionVariable</code> itself relied on a helper <code>Sleeper</code> class, which
enabled us to explicitly control thread sleeping and waking.</p>
<p>This problem is trickier than it appears; the extra machinery relative to the
<code>wait_until_zero</code> variant is substantial. However, it appears to be necessary; I
haven't been able to find a solution that is fundamentally different than the
one presented here. All the simpler solutions I've seen involve "tricks" like
using Software Transactional Memory, which in turn must be built using a
primitive not unlike <code>MultiConditionVariable</code>. If you've discovered a solution
that is simpler but does not rely on more powerful primitives, I'd love to hear
about it.</p>
<h1>Notes</h1>
<p>Waiting for one of several conditions to become true is an old trick. It is
used, for instance, in the Windows <code>WaitForMultipleObjects</code> system call. A
solution that goes down the same paths as this one is used in Unix ports of the
Windows threading APIs.</p>
<p>The <code>Sleeper</code> class is essentially an implementation of the Java <code>LockSupport</code>
class, which is a low-level API to allow thread sleeping and waiting for use in
locking primitives.</p>A threading riddle2015-06-24T15:20:00-07:002015-06-24T15:20:00-07:00David Goldblatttag:dgoldblatt.com,2015-06-24:/a-threading-riddle.html<p>A tricky problem involving threads waiting for complex conditions to become true</p><h1>Introduction</h1>
<p>The two key primitives used as building blocks for (blocking) concurrent
algorithms are mutexes and condition variables, showing up in C++ as
<code>std::mutex</code> and <code>std::condition_variable</code>.</p>
<p>In this post, I'll describe a pair of functions that can be simply described,
but whose implementation requires a tricky use of these primitives.</p>
<h1>The problem</h1>
<p>Suppose we have an array of <code>N</code> <code>int</code>s. There are two functions we want to
implement:</p>
<ul>
<li>
<p><code>void modify(size_t index, int value);</code> This changes the element at position
<code>index</code> to equal <code>value</code>.</p>
</li>
<li>
<p><code>void wait_until_equal(size_t index1, size_t index2);</code> This blocks until the
elements at positions <code>index1</code> and <code>index2</code> of the array are equal.</p>
</li>
</ul>
<p>The question is: how do we implement <code>modify</code> and <code>wait_until_equal</code>?</p>
<h1>The rules</h1>
<p>Without further requirements, the problem is trivial: just have a single lock
guarding the array, together with a single condition variable that all readers
wait on. But this severely limits concurrency: calls to <code>modify</code> that operate on
different indices block one another. It's also wasteful of the waiter time;
every waiter must wake up, acquire the lock, and re-check the values of its
indices on <em>every</em> modification to the array, even if the indices it's
interested in weren't modified. This motivates the first rule:</p>
<ul>
<li>Calls to one of the functions should not block waiting on a call to another
one of the functions unless the two calls have one of their index arguments
equal.</li>
</ul>
<p>We'll relax the problem a little bit, to make it easier, to get a second rule
that deals with races between <code>modify</code> calls.</p>
<ul>
<li>If a call to <code>wait_until_equal(index1, index2)</code> occurs, and the elements at
indices <code>index1</code> and <code>index2</code> become equal only briefly, it's okay for the
<code>wait_until_equal</code> call not to return; it can "miss" situations that don't
exist for long enough.</li>
</ul>
<p>We don't want this rule to be abused though (e.g. with a <code>wait_until_equal</code> call
that never returns), so we'll add one final rule.</p>
<ul>
<li>If the elements at the indices passed to a <code>wait_until_equal</code> call become and
remain equal, the <code>wait_until_equal</code> call will eventually return.</li>
</ul>
<p>Finally, we want to make sure we don't waste too much CPU time; we'll have a
no-spinning requirement</p>
<ul>
<li>The internals of <code>wait_until_equal</code> shouldn't busy-wait by spinning until
their conditions become true. Similarly, loops like
<code>while (!check_condition())
{ std::this_thread::sleep_for(std::chrono::milliseconds(15)) }</code> are cheating.
The <code>wait_until_equal</code> function shouldn't use CPU time that's not proportional
to the number of <code>modify</code> operations that are used on its arguments.</li>
</ul>
<p>These rules are really just to try to prevent "clever" solutions that dodge the
real concurrency issues that are present. I promise that there's a reasonable
solution to this that doesn't involve waiting forever, calling <code>exit(0)</code>, or
other trickery. The only concurrency primitives you need are mutexes and
condition variables, and you don't need an unreasonable number of them (e.g.
having <code>N**2</code> variables is unnecessary).</p>
<h1>Discussion</h1>
<p>This problem is pretty abstracted, and the API presented to users is unrealistic
in several ways:</p>
<ul>
<li>There's no way to <em>read</em> the values from the array, only to set them and
compare them for equality</li>
<li>Once you know two values are equal, they won't necessarily remain equal
after the call to <code>wait_until_equal</code> returns, so you can't rely on their
equality.</li>
</ul>
<p>But these problems are easy to remedy, once a solution to this simplified
problem is found, and solving only the simplification gets to the heart of the
matter. There are real-world problems that require techniques similar to this
one.</p>
<p>I'll post a solution sometime in the next couple of weeks. I'll keep it in a
separate post to avoid spoiling it for anyone reading this one.</p>Going from lock-free to wait-free2015-05-05T15:20:00-07:002015-05-05T15:20:00-07:00David Goldblatttag:dgoldblatt.com,2015-05-05:/going-from-lock-free-to-wait-free.html<p>Using further data replication tricks, we can go from lock-free reads to wait-free ones.</p><h1>Introduction</h1>
<p>In <a href="http://dgoldblatt.com/lock-free-reads-through-data-replication.html">the last post</a>, we
looked at ways in which replicating data allowed readers to proceed in a
lock-free manner. In this post, we'll extend this trick further, and show how to
make readers wait-free.</p>
<p>Like last time, we'll make a number of simplifying assumptions throughout:
counters never overflow, all variables are 0-initialized (unless otherwise
specified), readers and writers don't throw exceptions, and writers are
externally synchronized, so we may assume only one writer at a time. These
limitations are trivial to eliminate. We'll also leave all atomic memory
operations with the default sequentially consistent memory order. This is always
correct, but can be inefficient. A warning: I've thought hard about but not tested the
included code. As always, check and test before using.</p>
<h1>The goal</h1>
<p>To make it clear what we're after, we seek an implementation of the following
interface:</p>
<div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="k">class</span> <span class="nc">ReaderWriterData</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="kt">void</span> <span class="n">read</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="k">const</span> <span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">reader_fn</span><span class="p">);</span>
<span class="kt">void</span> <span class="nf">write</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">writer_fn</span><span class="p">);</span>
<span class="p">};</span>
</pre></div>
<p>In which the <code>read()</code> method always completes in a finite number of steps,
regardless of any writer activity. The writer might start modifying the
protected data, get interrupted by the operating system and never rescheduled,
and readers should <em>still</em> be able to complete their reads. This should be
enough to convince us that maintaining multiple copies of the data structure is
necessary: otherwise, readers would have no copy to read from while the writer
is blocked (they can't read from the copy the writer is modifying without seeing
inconsistent state). The writer will have to have some mechanism of directing
readers to one copy or another of the protected data.</p>
<h1>A solution</h1>
<h2>Inspiration</h2>
<p>The interface above looks similar to that of data protected by a reader-writer
lock, so let's start there. Below is a simple, reader-preference, busy-waiting
reader-writer lock. The low-order bit is used to indicate the presence of a
writer, and all other bits are a count of the number of readers blocking the
writer from proceeding.</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ReaderWriterLock</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="kt">void</span> <span class="n">lock</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">bool</span> <span class="n">swap_succeeded</span><span class="p">;</span>
<span class="k">do</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">expected</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">swap_succeeded</span> <span class="o">=</span> <span class="n">lock_word_</span><span class="p">.</span><span class="n">compare_exchange_weak</span><span class="p">(</span><span class="n">expected</span><span class="p">,</span> <span class="n">kWriterPresent</span><span class="p">);</span>
<span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">swap_succeeded</span> <span class="o">==</span> <span class="nb">false</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">unlock</span><span class="p">()</span> <span class="p">{</span>
<span class="n">lock_word_</span><span class="p">.</span><span class="n">fetch_sub</span><span class="p">(</span><span class="n">kWriterPresent</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">lock_shared</span><span class="p">()</span> <span class="p">{</span>
<span class="n">lock_word_</span><span class="p">.</span><span class="n">fetch_add</span><span class="p">(</span><span class="n">kReaderIncrement</span><span class="p">);</span>
<span class="k">while</span> <span class="p">(</span><span class="n">lock_word_</span><span class="p">.</span><span class="n">load</span><span class="p">()</span> <span class="o">&</span> <span class="n">kWriterPresent</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Busy loop</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">unlock_shared</span><span class="p">()</span> <span class="p">{</span>
<span class="n">lock_word_</span><span class="p">.</span><span class="n">fetch_sub</span><span class="p">(</span><span class="n">kReaderIncrement</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">private</span><span class="o">:</span>
<span class="k">static</span> <span class="k">const</span> <span class="kt">int</span> <span class="n">kWriterPresent</span> <span class="o">=</span> <span class="mi">1</span> <span class="o"><<</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">static</span> <span class="k">const</span> <span class="kt">int</span> <span class="n">kReaderIncrement</span> <span class="o">=</span> <span class="mi">1</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">lock_word_</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
<p>We'll take two ideas from this lock:</p>
<ul>
<li>
<p>Maintaining a count of active readers, which the writer waits to drop to
zero.</p>
</li>
<li>
<p>Letting the writer advertise its presence to readers, who then know that
reading is unsafe.</p>
</li>
</ul>
<p>We'll combine them with our realizations from the last section: we need multiple
copies of the data structure, which the writer will move readers between. Since
we'll have multiple copies, we'll need to apply the writer's modifications to
each of the copies.</p>
<h2>An incorrect approach</h2>
<p>Here is a first pass at a solution using reader counts:</p>
<div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="k">class</span> <span class="nc">ReaderWriterData</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="kt">void</span> <span class="n">read</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="k">const</span> <span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">reader_fn</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">version</span> <span class="o">=</span> <span class="n">version_for_readers_</span><span class="p">.</span><span class="n">load</span><span class="p">();</span>
<span class="n">reader_count_</span><span class="p">[</span><span class="n">version</span><span class="p">].</span><span class="n">fetch_add</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="n">reader_fn</span><span class="p">(</span><span class="n">data_</span><span class="p">[</span><span class="n">version</span><span class="p">]);</span>
<span class="n">reader_count_</span><span class="p">[</span><span class="n">version</span><span class="p">].</span><span class="n">fetch_sub</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">write</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">writer_fn</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">2</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Move readers away from version i</span>
<span class="n">version_for_readers_</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">i</span><span class="p">);</span>
<span class="c1">// Wait for them to leave</span>
<span class="k">while</span> <span class="p">(</span><span class="n">reader_count_</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">load</span><span class="p">()</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Busy wait</span>
<span class="p">}</span>
<span class="c1">// Do the write</span>
<span class="n">writer_fn</span><span class="p">(</span><span class="n">data_</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">private</span><span class="o">:</span>
<span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">version_for_readers_</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">reader_count_</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
<span class="n">T</span> <span class="n">data_</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
<span class="p">};</span>
</pre></div>
<p>This approach doesn't work, because of a race between the reader and writer. To
see how this race manifests, suppose there is only 1 reader. It reads
<code>version_for_readers_</code> to be 0. Then the writer appears, stores 1 into
<code>version_for_readers_</code>, and waits for <code>reader_count_[0]</code> to become 0. Since
the reader has not yet incremented <code>reader_count_[0]</code>, the writer may exit its
loop and begin its write to <code>data_[0]</code>. The reader then increments
<code>reader_count_[0]</code> and proceeds to read from <code>data_[0]</code>. We now have a writer
and a reader concurrently accessing <code>data_[0]</code>, violating our interface.</p>
<h2>Fixing the race</h2>
<p>To address the problem from the last section, we need to be more precise about
the meaning of an increment to <code>reader_count_[i]</code>. It is a mechanism for a
reader to prevent the writer from beginning a <em>new</em> modification to <code>data_[i]</code>.
This has two consequences:</p>
<ul>
<li>
<p>If the reader increments <code>reader_count_[i]</code> and subsequently observes the
writer to be absent from version <code>i</code>, then no write may proceed on version
<code>i</code> until the reader undoes its increment; reading is safe between these
points.</p>
</li>
<li>
<p>If the reader has incremented <em>both</em> counters, a writer is prevented from
switching versions.</p>
</li>
</ul>
<p>This suggests the following strategy for readers: read <code>version_for_readers_</code>,
and block writers from beginning a new modification to the indicated version.
Then, check to see if a writer may already be present in version (i.e. check to
see if the race from the previous section occurred). If not; great:
reading is safe on the version until the reader unblocks the writer from
beginning a modification to it. If the writer may be present, then the reader
should try reading from the other version; it blocks updates to that one (so at
this point, the writer is prevented from starting a modification to <em>either</em>
version). Then, the reader can recheck for writer presence, safe in the
knowledge that whichever version is safe to read from will remain safe until it
unblocks the writer from it. Whichever version the reader is not reading from,
it can then unblock the writer from.</p>
<p>Here's how this looks in code:</p>
<div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="k">class</span> <span class="nc">ReaderWriterData</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="kt">void</span> <span class="n">read</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="k">const</span> <span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">reader_fn</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// We'll fill this in later.</span>
<span class="kt">int</span> <span class="n">version_to_read</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">version_1</span> <span class="o">=</span> <span class="n">version_for_readers_</span><span class="p">.</span><span class="n">load</span><span class="p">();</span>
<span class="n">reader_count_</span><span class="p">[</span><span class="n">version_1</span><span class="p">].</span><span class="n">fetch_add</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">version_2</span> <span class="o">=</span> <span class="n">version_for_readers_</span><span class="p">.</span><span class="n">load</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">version_1</span> <span class="o">==</span> <span class="n">version_2</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// We stopped new modifications to version version_1, and subsequently</span>
<span class="c1">// observed that the writer wants us to use version version_1,</span>
<span class="c1">// indicating that no write is happening there. So, no write is</span>
<span class="c1">// happening to version version_1, and no write will start there until</span>
<span class="c1">// we decrement reader_count_[version_1].</span>
<span class="n">version_to_read</span> <span class="o">=</span> <span class="n">version_1</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="c1">// The version changed out from under us; a writer might have already</span>
<span class="c1">// begun modifying version version_1 before we blocked modifications</span>
<span class="c1">// to it. Block modifications to version_2, and check if it's safe.</span>
<span class="n">reader_count_</span><span class="p">[</span><span class="n">version_2</span><span class="p">].</span><span class="n">fetch_add</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="c1">// At this point, no new modifications may begin on either version.</span>
<span class="kt">int</span> <span class="n">version_3</span> <span class="o">=</span> <span class="n">version_for_readers_</span><span class="p">.</span><span class="n">load</span><span class="p">();</span>
<span class="c1">// Version version_3 must be safe; the writer told us to use it, so</span>
<span class="c1">// the wasn't modifying it at the time of the previous statement.</span>
<span class="c1">// Moreover, modifications to either version aren't allowed to be</span>
<span class="c1">// started. So no modification started on version version_3 between</span>
<span class="c1">// the previous statement (when reading was safe), and will not start</span>
<span class="c1">// until we unblock writes to version version_3. We can read it</span>
<span class="c1">// safely.</span>
<span class="n">version_to_read</span> <span class="o">=</span> <span class="n">version_3</span><span class="p">;</span>
<span class="c1">// Unblock the *other* version</span>
<span class="n">reader_count_</span><span class="p">[</span><span class="mi">1</span> <span class="o">-</span> <span class="n">version_3</span><span class="p">].</span><span class="n">fetch_sub</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// We've ensured reading is safe at this point.</span>
<span class="n">reader_fn</span><span class="p">(</span><span class="n">data_</span><span class="p">[</span><span class="n">version_to_read</span><span class="p">]);</span>
<span class="c1">// Unblock the writer from the version we read.</span>
<span class="n">reader_count_</span><span class="p">[</span><span class="n">version_to_read</span><span class="p">].</span><span class="n">fetch_sub</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">write</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">writer_fn</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">2</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="n">version_for_readers_</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">i</span><span class="p">);</span>
<span class="k">while</span> <span class="p">(</span><span class="n">reader_count_</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">load</span><span class="p">()</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Busy wait</span>
<span class="p">}</span>
<span class="n">writer_fn</span><span class="p">(</span><span class="n">data_</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">private</span><span class="o">:</span>
<span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">version_for_readers_</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">reader_count_</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
<span class="n">T</span> <span class="n">data_</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
<span class="p">};</span>
</pre></div>
<h3>Exercise</h3>
<p>Why shouldn't a reader always block writes to both versions before
checking <code>version_for_readers_</code>, and then unblock the one not in use?</p>
<h3>Exercise</h3>
<p>Give a similar strategy that detects writer presence by using the
<code>lock_word_</code> representation of the reader-writer lock above, and eliminating
<code>version_for_readers_</code>.</p>
<h3>Exercise</h3>
<p>Give a similar strategy that allows readers to block writers by
writing to a per-reader data structure, which the writer may observe. Note: as a
consequence, we can see that no atomic compare-and-swap or fetch-and-add or
other RMW primitives are necessary to implement the interface and progress
guarantees we sought. This is a rather interesting fact in and of itself.</p>
<h1>Increasing writer-performance</h1>
<h2>Lazy updates</h2>
<p>The careful reader might have noticed that one round of busy looping in the
<code>write</code> method above could have been avoided: since the writer waits for all the
readers to leave the second instance before modifying it, on the <em>next</em> call
to <code>write</code>, there are no readers reading the second instance. So, if the writer
changed its iteration order to always start on version
<code>1 - version_for_readers_.load()</code>, the busy wait loop will complete on the first
iteration.</p>
<p>We can go a step further, however. Once we've performed the write on one
instance of the data structure, and indicated to readers that they should begin
using it, there's no need for the writer to wait for the readers to leave the
other copy of the data structure, since no new readers are going to use it.
Instead, we can store the <code>std::function</code> that perform the modification of the
data structure, and leave the work of actually performing the modification for
the next writer (note then that the passed in function can't e.g. hold
references to data on the writing thread's stack unless it is sure it will
outlive the <code>ReaderWriterData</code>). The next time a write occurs, the writer will
apply <em>both</em> modifications to the old copy of the data structure, which
hopefully all the readers will have already left. If the time between <code>write</code>
calls is longer than the time it takes to perform a read, then the writer will
never have to wait for readers to leave a version of the data structure.</p>
<h2>Increased data replication</h2>
<p>Even using lazy updates, it is still possible for a writer to have to wait for
readers to leave. This can be particularly problematic if, for instance, the
number of readers is higher than the number of CPUs allocated to the process. In
that case, there will always be some reader which is descheduled, probably for a
time on the order of milliseconds. Since operations on most data structures are
much faster (on the order of nano- or micro- seconds), this can lead to
significant writer slowdowns and busy waiting.</p>
<p>To avoid this, we can increase the number of copies of the data structure we
keep. If slow readers are stuck on version 0 of the data structure and version 1
is the current one, then a writer may proceed on version 2 without interfering
with readers or writers. This doubles the amount of time we allow for read
operations to take before they begin to block writers. This parameter is
tunable; we can decrease the odds of the writer blocking, at the cost of
increasing storage consumption and the number of total data structure
modifications per logical write.</p>
<h2>An implementation</h2>
<p>The combination of the previous two performance improvements causes some amount
of subtlety; when we have more than two copies of the data structure, which one
do we update in response to a new write?</p>
<p>It's clear that we can't modify the most recently updated version (that's the
one new readers will read from). We could proceed in a round-robin manner, but
that means that the writer might busy-loop waiting for readers to leave one
version, even if other versions are empty of readers. We will adopt a
middle-ground: picking the oldest version of the data structure that does not
have any readers reading from it. This will require us to explicitly keep track
of the number of modifications each version of the data structure has undergone,
and to store as many of the <code>std::function</code>s passed in to <code>write</code> as is necessary
to bring the oldest version up to date.</p>
<p>Similarly, with more than 2 copies of the data structure, we can't infer the
version being written using only the version readers should read. Put another
way, <code>version_for_readers_</code> in the previous section really served two purposes:</p>
<ul>
<li>
<p>Letting readers know where to read from</p>
</li>
<li>
<p>Warning readers where a write might be happening.</p>
</li>
</ul>
<p>In the 2-version case, the version being written was trivially determinable from
the version readers were reading, so we could handle this with only one
variable. In the multiply-replicated case, we'll have to split this variable in
two: <code>current_version_</code> will be the most up-to-date version of the data
structure, and tells readers where they should read from, while
<code>version_being_written_</code> will indicate where a modification might be underway.
<code>version_being_written_</code> functions as sort of an inversion of the
"hazard-pointer" technique, if you're familiar with such strategies; it's a way
for writers to indicate dangers to readers.</p>
<p>Here is a version of <code>ReaderWriterData</code> with the optimizations described:</p>
<div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="p">,</span> <span class="kt">int</span> <span class="n">num_versions</span><span class="o">></span>
<span class="k">class</span> <span class="nc">ReaderWriterData</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="kt">void</span> <span class="n">read</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="k">const</span> <span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">reader_fn</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">version_to_read</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">version_1</span> <span class="o">=</span> <span class="n">current_version_</span><span class="p">.</span><span class="n">load</span><span class="p">();</span>
<span class="n">reader_count_</span><span class="p">[</span><span class="n">version_1</span><span class="p">].</span><span class="n">fetch_add</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">written_1</span> <span class="o">=</span> <span class="n">version_being_written_</span><span class="p">.</span><span class="n">load</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">version_1</span> <span class="o">!=</span> <span class="n">written_1</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Great; we blocked new modifications to version version_1, and then</span>
<span class="c1">// observed that no in-flight modifications were already happening. We</span>
<span class="c1">// can proceed on it.</span>
<span class="n">version_to_read</span> <span class="o">=</span> <span class="n">version_1</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="c1">// The writer moved on, and started a modification on version</span>
<span class="c1">// version_1. That means current_version_ must have changed. Try again</span>
<span class="c1">// on that.</span>
<span class="kt">int</span> <span class="n">version_2</span> <span class="o">=</span> <span class="n">current_version_</span><span class="p">.</span><span class="n">load</span><span class="p">();</span>
<span class="n">reader_count_</span><span class="p">[</span><span class="n">version_2</span><span class="p">].</span><span class="n">fetch_add</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="c1">// Now, new modifications are blocked to versions version_1 and</span>
<span class="c1">// version_2. At least one of them is not being modified, and so is</span>
<span class="c1">// safe to read from.</span>
<span class="kt">int</span> <span class="n">written_2</span> <span class="o">=</span> <span class="n">version_being_written_</span><span class="p">.</span><span class="n">load</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">version_2</span> <span class="o">==</span> <span class="n">written_2</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Have to use version version_1.</span>
<span class="n">reader_count_</span><span class="p">[</span><span class="n">version_2</span><span class="p">].</span><span class="n">fetch_sub</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="n">version_to_read</span> <span class="o">=</span> <span class="n">version_1</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="c1">// Can use version version_2.</span>
<span class="n">reader_count_</span><span class="p">[</span><span class="n">version_1</span><span class="p">].</span><span class="n">fetch_sub</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="n">version_to_read</span> <span class="o">=</span> <span class="n">version_2</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">reader_fn</span><span class="p">(</span><span class="n">data_</span><span class="p">[</span><span class="n">version_to_read</span><span class="p">]);</span>
<span class="n">reader_count_</span><span class="p">[</span><span class="n">version_to_read</span><span class="p">].</span><span class="n">fetch_sub</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">write</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">writer_fn</span><span class="p">)</span> <span class="p">{</span>
<span class="n">in_flight_modifications_</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">writer_fn</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">version_to_write</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="k">do</span> <span class="p">{</span>
<span class="n">version_to_write</span> <span class="o">=</span> <span class="n">oldest_version_with_no_readers</span><span class="p">();</span>
<span class="n">version_being_written_</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="n">version_to_write</span><span class="p">);</span>
<span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">reader_count_</span><span class="p">[</span><span class="n">version_to_write</span><span class="p">].</span><span class="n">load</span><span class="p">()</span> <span class="o">></span> <span class="mi">0</span><span class="p">);</span>
<span class="c1">// When we exit the loop, we'll have warned any possible future readers</span>
<span class="c1">// that they should avoid version version_to_write, and then observed</span>
<span class="c1">// that no readers are present in that version. Therefore, any future</span>
<span class="c1">// readers who try to enter it will observe our write to</span>
<span class="c1">// version_being_written_.</span>
<span class="c1">// Now, apply all necessary pending modifications to version</span>
<span class="c1">// version_to_write.</span>
<span class="kt">int</span> <span class="n">index_of_first_unapplied_modification</span> <span class="o">=</span>
<span class="n">modification_count_</span><span class="p">[</span><span class="n">version_to_write</span><span class="p">]</span> <span class="o">-</span> <span class="n">modification_num_of_front_</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">index_of_first_unapplied_modification</span><span class="p">;</span>
<span class="n">i</span> <span class="o"><</span> <span class="n">in_flight_modifications_</span><span class="p">.</span><span class="n">size</span><span class="p">();</span>
<span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="n">in_flight_modifications_</span><span class="p">[</span><span class="n">i</span><span class="p">](</span><span class="n">data_</span><span class="p">[</span><span class="n">version_to_write</span><span class="p">]);</span>
<span class="o">++</span><span class="n">modification_count_</span><span class="p">[</span><span class="n">version_to_write</span><span class="p">];</span>
<span class="p">}</span>
<span class="c1">// Update readers to use the new version.</span>
<span class="n">version_being_written_</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
<span class="n">current_version_</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="n">version_to_write</span><span class="p">);</span>
<span class="c1">// Clear out any modifications that have been applied to every version.</span>
<span class="kt">int</span> <span class="n">min_modification_count</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">min_element</span><span class="p">(</span><span class="n">modification_count_</span><span class="p">,</span>
<span class="n">modification_count_</span>
<span class="o">+</span> <span class="n">num_versions</span><span class="p">);</span>
<span class="k">while</span> <span class="p">(</span><span class="n">modification_num_of_front_</span> <span class="o"><</span> <span class="n">min_modification_count</span><span class="p">)</span> <span class="p">{</span>
<span class="n">in_flight_modifications_</span><span class="p">.</span><span class="n">pop_front</span><span class="p">();</span>
<span class="o">++</span><span class="n">modification_num_of_front_</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">private</span><span class="o">:</span>
<span class="n">std</span><span class="o">::</span><span class="n">array</span><span class="o"><</span><span class="kt">int</span><span class="p">,</span> <span class="n">num_versions</span><span class="o">></span> <span class="n">versions_ordered_by_age</span><span class="p">()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">array</span><span class="o"><</span><span class="kt">int</span><span class="p">,</span> <span class="n">num_versions</span><span class="o">></span> <span class="n">result</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">num_versions</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="n">result</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">std</span><span class="o">::</span><span class="n">sort</span><span class="p">(</span><span class="n">result</span><span class="p">.</span><span class="n">begin</span><span class="p">(),</span> <span class="n">result</span><span class="p">.</span><span class="n">end</span><span class="p">();</span> <span class="p">[</span><span class="o">&</span><span class="p">](</span><span class="kt">int</span> <span class="n">i1</span><span class="p">,</span> <span class="kt">int</span> <span class="n">i2</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">modification_count</span><span class="p">[</span><span class="n">i1</span><span class="p">]</span> <span class="o"><</span> <span class="n">modification_count_</span><span class="p">[</span><span class="n">i2</span><span class="p">];</span>
<span class="p">});</span>
<span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="n">oldest_version_with_no_readers</span><span class="p">()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">array</span><span class="o"><</span><span class="kt">int</span><span class="p">,</span> <span class="n">num_versions</span><span class="o">></span> <span class="n">versions</span> <span class="o">=</span> <span class="n">versions_ordered_by_age</span><span class="p">();</span>
<span class="k">while</span> <span class="p">(</span><span class="nb">true</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">num_versions</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">versions</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="n">current_version_</span><span class="p">.</span><span class="n">load</span><span class="p">())</span> <span class="p">{</span>
<span class="c1">// Don't want try to update the current version!</span>
<span class="k">continue</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">reader_count_</span><span class="p">[</span><span class="n">versions</span><span class="p">[</span><span class="n">i</span><span class="p">]].</span><span class="n">load</span><span class="p">()</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">versions</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// The version readers should try to read from</span>
<span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">current_version_</span><span class="p">;</span>
<span class="c1">// The version that a writer is currently modifying.</span>
<span class="c1">// For simplicity, assume this is initially -1.</span>
<span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">version_being_written_</span><span class="p">;</span>
<span class="c1">// All modifications that some version has not yet had applied to it</span>
<span class="n">std</span><span class="o">::</span><span class="n">deque</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">>></span> <span class="n">in_flight_modifications_</span><span class="p">;</span>
<span class="c1">// The modification number of the front of in_flight_modifications_</span>
<span class="kt">int</span> <span class="n">modification_num_of_front_</span><span class="p">;</span>
<span class="c1">// The modification counts of each version</span>
<span class="kt">int</span> <span class="n">modification_count_</span><span class="p">[</span><span class="n">num_versions</span><span class="p">];</span>
<span class="c1">// The number of readers reading each version</span>
<span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">reader_count_</span><span class="p">[</span><span class="n">num_versions</span><span class="p">];</span>
<span class="c1">// The versions themselves.</span>
<span class="n">T</span> <span class="n">data_</span><span class="p">[</span><span class="n">num_versions</span><span class="p">];</span>
<span class="p">};</span>
</pre></div>
<h3>Exercise</h3>
<p>In the last section, there was an exercise to use a per-reader data
structure instead of atomic fetch-and-adds. Generalize the solution to use the
optimizations described, using no more than <code>O(num_readers * num_versions)</code>
atomic integer variables in the helper data structures.</p>
<h3>Exercise</h3>
<p>Reduce the space usage from the previous exercise to be only
<code>O(num_readers + num_versions)</code>.</p>
<h2>Going to the limit - getting a wait-free writer</h2>
<p>With the optimizations above, we can reduce wait times for writers
significantly. This leads us to wonder -- can we reduce them all the way to 0,
and have a writer that can't be blocked by readers? Perhaps surprisingly, the
answer is yes, so long as we can bound the number of readers.</p>
<p>Consider what happens when we set <code>num_versions</code> to <code>2 * num_readers + 2</code>.
The writer is wait free so long as there is some unblocked version other than
the current version. Each reader blocks at most 2 versions at a time, so there
are at most <code>2 * num_readers</code> versions blocked at a time. There are therefore at
most <code>2 * num_readers + 1</code> versions the writer cannot proceed on, and so there
is at least one version that the writer <em>can</em> proceed on. All loops in the
writer will therefore terminate in at most 1 iteration, and the writer is wait
free.</p>
<h3>Exercise</h3>
<p>Come up with a scheme that requires only <code>num_readers + 2</code> copies of a
data structure to provide wait-freedom.</p>
<p>Hint: extend the strategy, developed in the previous exercises, of per-reader
state. Introduce a reader state corresponding to "about to perform a read".
Readers set their state to "about to perform a read", and then compare and swap
their state to "reading on version <code>X</code>". Writers try to compare and swap all
reader states from "about to perform a read" to "reading on version <code>Y</code>" before
writing to a version other than <code>Y</code>.</p>
<h1>Possible extensions</h1>
<p>There are a number of extensions that could be applied to the strategies
discussed in this post. Two that stand out as being of particular practical
performance are adding a try-write method, and adopting these techniques to use
futexes or a similar technique.</p>
<p>Try-write is straightforward: an attempt to write is made, but the writer does
not proceed if it might block. This is a straightforward: instead of looping in
the <code>write</code> method, the writer only tries to write once. If it fails, it returns
a status indicating that the write did not occur.</p>
<p>A futex is a scheduling primitive used to efficiently implement mixed
user-level/kernel-level locking on Linux. Fast-path, contention-free lock
acquisitions happen entirely in user-space, only entering the kernel in the case
where a lock is contended, and then blocking, allowing other threads to use the
CPU time that would otherwise be spent spinning. This mechanism could be used to
avoid the spinning required by <code>write</code>. The readers would keep a count of the
number of blocked versions of the data structure, waking up the writer when the
count becomes nonzero.</p>
<h1>Notes</h1>
<p>The reader-preference reader-writer lock is a well-known algorithm due to
<a href="http://dl.acm.org/citation.cfm?id=109637">Mellor-Crummey and Scott</a>.</p>
<p>Many of the ideas here a well-known in the garbage collection and database
literature; in particular:</p>
<ul>
<li>
<p>Maintaining multiple copies of a data structure to allow readers to avoid
waiting for writers</p>
</li>
<li>
<p>Maintaining versions with those copies so that writers can proceed while
readers are reading from an old version</p>
</li>
<li>
<p>Keeping a log of modifications to the data, which gets applied in batch.</p>
</li>
</ul>
<p>Instead of detailing all of these, I'll note only some of the literature that
focuses on strategies for in-memory data structures on shared-memory
multiprocessors with simple version reclamation semantics (i.e. which do not
require the use of garbage collection).</p>
<p>The first algorithm that allows for wait-free readers in the presence of a
writer goes back to <a href="http://dl.acm.org/citation.cfm?id=357198">Peterson</a>, whose
scheme is also wait-free for the writer. It comes at the cost of maintaining
<code>num_readers + 2</code> copies of the protected data, which he shows to be both
necessary and sufficient. His scheme is only applicable to arrays of data which
allow concurrent reads and writes (a la <code>std::atomic<char>[N]</code>), and requires
the reader to execute up to 3 physical reads per logical read.</p>
<p><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.5034">Chen and
Burns</a> show how
to use a compare-and-swap primitive to implement a scheme that requires only 1
read of the data structure per logical read operation, and does not require
concurrent access to a data structure by both a reader and a writer.
writer to write to a copy of the data structure concurrently with a reader
reading from it. It also uses the theoretical minimum of <code>num_readers + 2</code>
copies of the data.</p>
<p><a href="http://concurrencyfreaks.blogspot.com/2013/12/left-right-classical-algorithm.html">Ramalhete and
Correia</a>
show how to reduce the number of copies of the data to only 2, at the cost of no
longer being wait-free for the writer. They provide several variants of their
technique, which they call the "left-right" algorithm. The two-copy strategy
presented here is a variant of this algorithm.</p>
<p>The fetch-and-add based concurrency-control strategy used here is, as far as I
know, novel, but it is reminiscent of the "Epoch-based reclamation" strategy
used by <a href="http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-579.pdf">Fraser</a> to
provide exclusion between readers and deleters of nodes in lock-free data
structures. Similarly, I believe the lazy-update scheme and the generalization
to multiply-replicated data are new, though lazy-updates bears similarity
to <a href="http://dl.acm.org/citation.cfm?id=1217965">Shalev and Shavit</a>'s Predictive
Log-Synchronization technique. In the same paper, they present a strategy that
allows concurrent reads by switching writers between two copies of a data
structure. Their solution, however, is not wait-free for readers.</p>
<p>Futexes were introduced to the linux kernel by
<a href="https://www.kernel.org/doc/ols/2002/ols2002-pages-479-495.pdf">Franke, Russell, and Kirkwood</a>,
though they bear similarities to BeOS "benaphores".</p>Lock-free reads through data replication2015-04-29T20:30:00-07:002015-04-29T20:30:00-07:00David Goldblatttag:dgoldblatt.com,2015-04-29:/lock-free-reads-through-data-replication.html<p>By keeping multiple copies of a data object around, we can allow readers to proceed even while a writer might be blocked mid-write.</p><h1>Introduction</h1>
<p>Seqlocks and reader-writer locks are two techniques used to control access to
shared, read-mostly data in multithreaded programs. Briefly, seqlocks work by
counting modifications to a data structure so that readers can see if a
modification occurred during their read (so that they know their read was
hazardous, and they should retry). Reader-writer locks work by allowing
exclusive access to a data structure to either a single writer, or many readers.</p>
<p>These approaches share a common problem: a writer that stalls (due to a page fault,
descheduling, or a long-running computation, for instance) inside a critical
section can block the readers indefinitely. We'd like to eliminate this
restriction; to allow readers to continue even while a writer is blocked, even
if that writer is blocked inside a critical section. That is to say, readers
should be lock-free with respect to writers. We seek a solution that allows
readers to continue even if a writer is blocked inside a critical section.</p>
<p>To accomplish this, we'll keep two copies of our data structure. The writer
needs to update both copies of the data, one after the other, in order to
logically complete a write. If the writer blocks, it will be blocked inside an
update to only one of the copies. The readers can then proceed by reading from
the other copy, giving us the lock-freedom property we sought.</p>
<p>For simplicity, we'll consider only the case of a single writer thread (there
can be many reader threads). This limitation can be easily removed by adding an
additional mutex to any data structures we consider, which a writer thread must
acquire before and release after invoking the functions we describe. I've
thought hard about the code here but haven't tested it, so be cautious before
using it.</p>
<h1>Background</h1>
<h2>Reader-writer locks</h2>
<p>Reader-writer locks provide safe access to shared data by changing the control
flow of threads attempting to access the data. When a writer owns the lock, it
owns it uniquely, and readers block if they attempt to access it. Alternatively,
some number of readers may share ownership of the lock; they are then allowed to
read but not modify the data structure, and any writers block while readers are
present.</p>
<p>Reader-writer locks are standardized in C++14 via the <code>std::shared_timed_mutex</code>
class. Below is a class that encapsulates the protected data and the protection
mechanism. Readers and the writer pass in a callback function that performs the
desired operations. For simplicity, we'll ignore things like exception safety.</p>
<div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="k">class</span> <span class="nc">ReaderWriterLock</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="kt">void</span> <span class="n">read</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="k">const</span> <span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">reader_fn</span><span class="p">)</span> <span class="p">{</span>
<span class="n">mu_</span><span class="p">.</span><span class="n">lock_shared</span><span class="p">();</span>
<span class="n">reader_fn</span><span class="p">(</span><span class="n">data_</span><span class="p">);</span>
<span class="n">mu_</span><span class="p">.</span><span class="n">unlock_shared</span><span class="p">();</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">write</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">writer_fn</span><span class="p">)</span> <span class="p">{</span>
<span class="n">mu_</span><span class="p">.</span><span class="n">lock</span><span class="p">();</span>
<span class="n">writer_fn</span><span class="p">(</span><span class="n">data_</span><span class="p">);</span>
<span class="n">mu_</span><span class="p">.</span><span class="n">unlock</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">private</span><span class="o">:</span>
<span class="n">std</span><span class="o">::</span><span class="n">shared_timed_mutex</span> <span class="n">mu_</span><span class="p">;</span>
<span class="n">T</span> <span class="n">data_</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
<h2>Seqlocks</h2>
<p>RW locks often perform poorly in practice. Readers must update lock metadata in
order to indicate their presence to a potential future writer. In typical
implementations, this involves obtaining exclusive write access to the cacheline
containing the lock, which leads to a cache miss for subsequent readers. Even
with fast modern memory systems, cache misses can take longer than a short
critical section, and, what's worse, can be sequentialized in the memory system.
Updates to the lock itself then become a bottleneck. To avoid this, we'd like a
concurrency primitive in which readers can be truly read-only. The seqlock is
one such primitive.</p>
<p>Seqlocks (short for sequence locks, for reasons which will soon become clear)
allow readers to access data concurrently with writers. Since readers proceed
without blocking writers, the readers might see an inconsistent state for the
protected data (imagine a reader that reads half the protected data before the
writer modifies it, then half after). The seqlock provides a mechanism to detect
such a conflict, indicating that the reader should retry.</p>
<p>Associated with a seqlock is a sequence number. The sequence number starts at 0,
and is incremented before and after each write. This implies that at any time
between writes (and therefore, when the data is in a consistent state), the
sequence number is even, since the number of iniitial increments will equal the
number of concluding ones. Readers repeatedly read the sequence number, then the
data structure, then the sequence number again. If the sequence numbers are
even (indicating that they weren't read during a writer critical section) and
equal (indicating that no writer critical section started and finished in
between the two reads), then the reader can infer that the writer did not
interfere with its read. Otherwise, the reader should retry.</p>
<p>The code below encapsulates some seqlock-protected data. Because reads
and writes may occur on the object simultaneously, we have to represent the
stored data using <code>std::atomic</code> variables; in turn, this requires that we can
treat the protected data as trivially copyable. For simplicity, we won't enforce
this. Similarly, we'll ignore overflow of <code>sequence_number_</code>, use the
default memory order for all operations (giving sequential consistency), and
ignore the inevitable reads from an uninitialized <code>data_</code> array.</p>
<div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="k">class</span> <span class="nc">SeqLocked</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="kt">void</span> <span class="n">read</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="k">const</span> <span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">reader_fn</span><span class="p">)</span> <span class="p">{</span>
<span class="n">T</span> <span class="n">value</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">seq0</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">seq1</span><span class="p">;</span>
<span class="k">do</span> <span class="p">{</span>
<span class="n">seq0</span> <span class="o">=</span> <span class="n">sequence_number_</span><span class="p">.</span><span class="n">load</span><span class="p">();</span>
<span class="n">raw_read_into</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="n">seq1</span> <span class="o">=</span> <span class="n">sequence_number_</span><span class="p">.</span><span class="n">load</span><span class="p">();</span>
<span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">seq0</span> <span class="o">!=</span> <span class="n">seq1</span> <span class="o">||</span> <span class="n">seq0</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">reader_fn</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">write</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">writer_fn</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Remember: we are assuming there is only one writer;</span>
<span class="c1">// this read is safe no matter what.</span>
<span class="n">T</span> <span class="n">value</span><span class="p">;</span>
<span class="n">raw_read_into</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="n">writer_fn</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="n">sequence_number_</span><span class="p">.</span><span class="n">fetch_add</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="n">raw_write_from</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="n">sequence_number_</span><span class="p">.</span><span class="n">fetch_add</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">private</span><span class="o">:</span>
<span class="kt">void</span> <span class="n">raw_read_into</span><span class="p">(</span><span class="n">T</span><span class="o">&</span> <span class="n">output</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">);</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="o">*</span><span class="p">((</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="o">&</span><span class="n">output</span> <span class="o">+</span> <span class="n">i</span><span class="p">)</span> <span class="o">=</span> <span class="n">data_</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">load</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">raw_write_from</span><span class="p">(</span><span class="k">const</span> <span class="n">T</span><span class="o">&</span> <span class="n">input</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">);</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="n">data_</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">store</span><span class="p">(</span><span class="o">*</span><span class="p">((</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="o">&</span><span class="n">input</span> <span class="o">+</span> <span class="n">i</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">sequence_number_</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o"><</span><span class="kt">char</span><span class="o">></span> <span class="n">data_</span><span class="p">[</span><span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">)];</span>
<span class="p">};</span>
</pre></div>
<h1>The problem: blocked writers</h1>
<p>Suppose the writer is modifying some of the data protected by a seqlock. It
increments the sequence number, begins writing to the data, but is preempted by
the operating system before it completes. Then, all readers are blocked until
the writer is rescheduled and completes the write. The readers, therefore, are
not lock-free with respect to the writer. An analogous situation can occur using
a reader-writer lock.</p>
<h1>A solution: replicate the data</h1>
<p>The trick we'll use to solve this problem is old, but not widely known. We'll
keep two copies of the data being protected. If a writer is blocked midway
through a write to one copy of the data, the reader can still continue its read
through the other copy.</p>
<p>More specifically, a reader will attempt to complete its read on one copy of the
data. If it succeeds, great; the reader is done. If it fails, it will try to
read from the other copy of the data. This process repeats until a read attempt
succeeds. The writer updates one copy of the data structure, and then the other.</p>
<p>Here is how this technique looks for the reader-writer lock (again, we'll ignore
trivialities like counter overflow and exceptions):</p>
<div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="k">class</span> <span class="nc">ReplicatedReaderWriterLock</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="kt">void</span> <span class="n">read</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="k">const</span> <span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">reader_fn</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">version_to_use</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(;;</span> <span class="n">version_to_use</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">mu_</span><span class="p">[</span><span class="n">version_to_use</span> <span class="o">%</span> <span class="mi">2</span><span class="p">].</span><span class="n">try_lock_shared</span><span class="p">())</span> <span class="p">{</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">reader_fn</span><span class="p">(</span><span class="n">data_</span><span class="p">[</span><span class="n">version_to_use</span> <span class="o">%</span> <span class="mi">2</span><span class="p">]);</span>
<span class="n">mu_</span><span class="p">[</span><span class="n">version_to_use</span> <span class="o">%</span> <span class="mi">2</span><span class="p">].</span><span class="n">unlock_shared</span><span class="p">();</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">write</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">writer_fn</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">2</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="n">mu_</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">lock</span><span class="p">();</span>
<span class="n">writer_fn</span><span class="p">(</span><span class="n">data_</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
<span class="n">mu_</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">unlock</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">private</span><span class="o">:</span>
<span class="n">std</span><span class="o">::</span><span class="n">shared_timed_mutex</span> <span class="n">mu_</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
<span class="n">T</span> <span class="n">data_</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
<span class="p">};</span>
</pre></div>
<p>Note: the reader-writer variant we present is not technically lock-free in our
setting; a try_lock_shared call may fail spuriously and return false even if the
lock is available in C++14. This is a technical detail and can be remedied, but
we won't do so here; such a problem showing up in practice is vanishingly
unlikely.</p>
<h1>Decreasing reader failure rates through further replication</h1>
<p>The technique in the previous section is enough to provide a form of
lock-freedom, but readers may still be starved for access if writers continue
much more quickly than them (a writer modifies both copies of a seqlock'd data
instance before a reader is able to read one, for instance). Here, we'll
consider an extension that gives us reduced chances of writer interference, at
the cost of additional copies of a data structure.</p>
<p>The key idea is to increase the amount of replication beyond two copies of a
data object. In order for the writer to block readers, it must update all the
objects before the reader reads any of them. Increasing the amount of
replication increases the likelihood of readers succeeding.</p>
<p>Here's how this looks for the seqlock:</p>
<div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="p">,</span> <span class="kt">int</span> <span class="n">num_replicas</span><span class="o">></span>
<span class="k">struct</span> <span class="n">ReplicatedSeqLocked</span> <span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="kt">void</span> <span class="n">read</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="k">const</span> <span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">reader_fn</span><span class="p">)</span> <span class="p">{</span>
<span class="n">T</span> <span class="n">value</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">seq0</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">seq1</span><span class="p">;</span>
<span class="k">do</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">version</span> <span class="o">=</span> <span class="n">current_version_</span><span class="p">.</span><span class="n">load</span><span class="p">();</span>
<span class="n">seq0</span> <span class="o">=</span> <span class="n">data_</span><span class="p">[</span><span class="n">version</span><span class="p">].</span><span class="n">sequence_number_</span><span class="p">.</span><span class="n">load</span><span class="p">();</span>
<span class="n">data_</span><span class="p">[</span><span class="n">version</span><span class="p">].</span><span class="n">raw_read_into</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="n">seq1</span> <span class="o">=</span> <span class="n">data_</span><span class="p">[</span><span class="n">version</span><span class="p">].</span><span class="n">sequence_number_</span><span class="p">.</span><span class="n">load</span><span class="p">();</span>
<span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">seq0</span> <span class="o">!=</span> <span class="n">seq1</span> <span class="o">||</span> <span class="n">seq0</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">reader_fn</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">write</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">function</span><span class="o"><</span><span class="kt">void</span> <span class="p">(</span><span class="n">T</span><span class="o">&</span><span class="p">)</span><span class="o">></span> <span class="n">writer_fn</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">version</span> <span class="o">=</span> <span class="n">current_version_</span><span class="p">.</span><span class="n">load</span><span class="p">();</span>
<span class="kt">int</span> <span class="n">new_version</span> <span class="o">=</span> <span class="p">(</span><span class="n">version</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="n">num_replicas</span><span class="p">;</span>
<span class="n">T</span> <span class="n">value</span><span class="p">;</span>
<span class="n">data_</span><span class="p">[</span><span class="n">version</span><span class="p">].</span><span class="n">raw_read_into</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="n">writer_fn</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="n">data_</span><span class="p">[</span><span class="n">new_version</span><span class="p">].</span><span class="n">sequence_number_</span><span class="p">.</span><span class="n">fetch_add</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="n">data_</span><span class="p">[</span><span class="n">new_version</span><span class="p">].</span><span class="n">raw_write_from</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="n">data_</span><span class="p">[</span><span class="n">new_version</span><span class="p">].</span><span class="n">sequence_number_</span><span class="p">.</span><span class="n">fetch_add</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="n">current_version_</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="n">new_version</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">private</span><span class="o">:</span>
<span class="k">struct</span> <span class="n">SeqLockData</span> <span class="p">{</span>
<span class="kt">void</span> <span class="n">raw_read_into</span><span class="p">(</span><span class="n">T</span><span class="o">&</span> <span class="n">output</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">);</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="o">*</span><span class="p">((</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="o">&</span><span class="n">output</span> <span class="o">+</span> <span class="n">i</span><span class="p">)</span> <span class="o">=</span> <span class="n">data_</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">load</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">raw_write_from</span><span class="p">(</span><span class="k">const</span> <span class="n">T</span><span class="o">&</span> <span class="n">input</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">);</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="n">data_</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">store</span><span class="p">(</span><span class="o">*</span><span class="p">((</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="o">&</span><span class="n">input</span> <span class="o">+</span> <span class="n">i</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">sequence_number_</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o"><</span><span class="kt">char</span><span class="o">></span> <span class="n">data_</span><span class="p">[</span><span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">)];</span>
<span class="p">};</span>
<span class="n">SeqLockData</span> <span class="n">data_</span><span class="p">[</span><span class="n">num_replicas</span><span class="p">];</span>
<span class="n">std</span><span class="o">::</span><span class="n">atomic</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">current_version_</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
<h1>Some concluding notes on our implementation</h1>
<p>There are a few differences between the replicated seqlock and replicated
reader-writer lock.</p>
<ul>
<li>
<p>The seqlock version updates the entire data structure on every modification.
The reader-writer one can make small in-place updates. This is a consequence
of the API we chose for our implementations, so it could potentially be
eliminated, but this would complicate the writer interface (the readers and
writers would have to deal with atomic-wrapped data directly, and ask the
protecting data structure whether its reads were safe).</p>
</li>
<li>
<p>The converse to this is that the replicated seqlock need only modify one
version of the data, whereas a replicated reader-writer lock must modify all
copies of the data.</p>
</li>
<li>
<p>An <code>N</code>-replicated reader-writer lock enables another optimization: a thread
can hash its thread-id or CPU number to get an index to read from:
<code>thread_id % N</code>. This spreads around the contention on lock-metadata,
reducing the performance overhead of the lock. In the limit, we have a data
instance per CPU, and can avoid all non-local writes in the reader path.</p>
</li>
</ul>
<p>Additionally, note that we could have used a single sequence counter in the
ReplicatedSeqLock, shared between all data instances. Readers would check it to
see that the range of modified instances does not include the version that it
read from. This results in a modest space savings, at the cost of losing some
simplicity of implementation.</p>
<h1>Notes</h1>
<p>Reader-writer locks were introduced by <a href="http://dl.acm.org/citation.cfm?id=362813">Courtois et
al.</a>,
and are a well-known synchronization tool.</p>
<p>Seqlocks were popularized by their introduction into the Linux kernel by <a href="https://lwn.net/Articles/21379/">Stephen
Hemminger (building on the work of Andrea
Arcangeli)</a>, but were introduced by
<a href="http://dl.acm.org/citation.cfm?id=359878">Leslie Lamport</a>, or arguably even earlier, in the
work of <a href="http://dl.acm.org/citation.cfm?id=806505">William B. Easton</a>. This fact
is not widely known.</p>
<p>The idea of using multiple copies of a data structure to improve the blocking
properties of readers goes back to
<a href="http://dl.acm.org/citation.cfm?id=357198">Peterson</a>,
who introduced a mechanism which is wait-free for the writer as well as readers.
These ideas were explored most fully by <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.5034">Chen and
Burns</a>.
<a href="http://dl.acm.org/citation.cfm?id=1455262">Larsson et al.</a> provide a
comprehensive survey.
<a href="http://dl.acm.org/citation.cfm?id=128736">Lamport</a> uses a clever variant of
a similar idea as well.</p>
<p>The specific cases of replicating
data for reader-writer locks and seqlocks is a well-known trick, but I can't
find a cite for the first use. It is used, for instance, in <a href="http://lxr.free-electrons.com/source/kernel/time/timekeeping.c?v=3.17#L231">the Linux
kernel</a>,
and explored by <a href="http://concurrencyfreaks.blogspot.com/2013/11/double-instance-locking.html">Ramalhete and
Correia</a>
in what they call "double-instance locking".</p>
<p>Using <em>more</em> than two copies of a data structure to improve reader seqlock throughput when
writers are common is a trick from <a href="http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/improved-lock-free-seqlock">Dmitry Vyukov</a>.</p>