Software Assurance            Software Hardening            Autonomic Computing

Fun with Concurrency Problems 

At trade shows, we often have a "find the bug" poster at our booth to draw in attendees. The most recent poster shows an example of a singleton pattern that contains an unpleasant data race. The data race is obvious, but it seems completely harmless, so many attendees are surprised to learn that it can manifest as a rather difficult-to-find bug. Here I go over the example from the poster to explain why the code is defective, and show some other examples where a seemingly benign data race can be actively harmful.

I should first point out that this article was inspired by Hans Boehm’s excellent paper How to miscompile programs with "benign" data races that appeared at the Usenix HotPar’11 Hot topics in parallelism conference in 2011. The examples below are derived from there.

First, let’s give the definition of a data race. A data race is a particular type of race condition — the class that involves concurrent access to memory locations in a multi-threaded program. A data race occurs when there are two or more threads of execution that access a shared memory location, at least one thread is changing the data at that location, and there is no explicit mechanism for coordinating access. If a data race occurs it can leave the program in an inconsistent state.

The singleton pattern is a very commonplace idiom where the program maintains a reference to a single underlying object, and a Boolean variable encodes whether it has been initialized. This pattern is also known as "lazy initialization".

The following code is an example of the pattern.

if (!initialized) {

object = create();

initialized = true;

}

... object ...

This code is perfectly adequate for a single-threaded program, but it is not thread safe because it has a data race on the variable named initialized. If called by two different threads there is a risk that both threads will observe initialized as false at essentially the same time, and both will call create(), thus violating the singleton property.

To make this thread safe, the natural approach is to protect the entire if statement with a lock. However acquiring and releasing locks is expensive, so programmers try to avoid the expense by using the "double-checked locking" idiom — a check outside the scope of the lock and another inside. The inner check is there to confirm that the first check still holds after the lock has been acquired.

if (!initialized) {

lock();

if (!initialized) {

object = create();

initialized = true;

}

unlock();

}

... object ...

Superficially this looks like it will suffice and indeed it will as long as the statements are guaranteed to execute in that order. However, an optimizing compiler may generate code that essentially switches the order of object = create() and initialized = true. After all, there is no explicit dependence between those two statements. In that case, if a second thread enters this code any time after the assignment to initialized, that thread will then use the value of object before it has been written to.

Optimizing compilers are inscrutable beasts. Those that optimize for speed will take many esoteric considerations into account, few of which are obvious to a programmer. It is common for them to generate instructions that are apparently out of order because doing so might result in fewer cache misses, or because fewer instructions are needed.

It is wrong to assume that because the reordering introduced a race condition in the above example that the compiler is at fault. The compiler is doing exactly what it is allowed to do; the language specification is perfectly clear and unambiguous about this: the compiler is allowed to assume that there are no data races in the program.

In actuality, the specification is somewhat broader: compilers are allowed to do anything in the presence of undefined behavior. This is sometimes facetiously referred to as the "catch fire" semantics; the specification gives the compiler permission to set your computer on fire if your program has undefined behavior. As well as data races, many traditional bugs such as buffer overruns, dereferences of invalid addresses, and so on constitute undefined behavior. Because compilers are free to do anything, rather than burn the building down, they typically do the sensible thing, which is to assume that the undefined behavior will never happen, and optimize accordingly.

The consequences of this can sometimes be very surprising, even to those who are experts in concurrency and compilers. It can be difficult to convince programmers that code that looks completely correct can be compiled into code that has serious errors. Here are some examples.


Example 1. Say thread 2 executes its statement while thread 1 is executing its loop.

Thread 1 Thread 2

flag = 0;

while (!flag) {

// Loop until flag changes

}

finish();

...

...

flag = 1;

Thread 1

flag = 0;

while (!flag) {

// Loop until flag changes

}

finish();

Thread 2

...

...

flag = 1;

Is it possible that finish() will never execute? The answer is YES.

This one is fairly easy. The optimizing compiler sees that flag is zero, and because it can assume that there are no data races, this code can be compiled as if it had been written as follows:

while (1) {

// Loop until flag changes

}

finish();


Now let’s look at the second example.

Example 2. Assume that before each thread executes, x.flag1 and x.flag2 are both zero, where flag1 and flag2 are bit fields.

Thread 1 Thread 2

x.flag1 = 1;

x.flag2 = 1;

Thread 1

x.flag1 = 1;

Thread 2

x.flag2 = 1;

Is it possible that one of flag1 and flag2 may be zero? The answer is YES.

This one is also fairly easy. The code to update a bit field will read the byte containing the field into memory, then it will set that bit to its value and write the byte back, usually three instructions are needed. If the structure is laid out so that flag1 and flag2 are in the same byte, then if the three instructions from each thread are interleaved, then one of the fields will not get updated as expected.


The next two examples are a little more challenging. Suppose there are two threads where one reads a shared variable and the other writes to it. Let us assume that it does not matter to the reader if it sees the value before or after it has been changed by the writer (this is not an uncommon pattern). If those accesses are not protected by a lock, then there is clearly a data race. Notwithstanding the catch fire rule however, most programmers would conclude that this is completely benign. However, turns out there are at least two plausible ways in which this code can be compiled where the reader will see a value that is wrong.

Example 3. Assume that global starts out having value zero.

Thread 1 Thread 2

global = y;

global = x;

global = x;
Thread 1

global = y;

 
Thread 2

After execution, is it possible for global to have a value that is neither x nor y nor zero? The answer is YES.

If global is a 64-bit variable and the code runs on an architecture that can only read 32-bit words. Then both the reader and the writer need two instructions, and an unlucky interleaving might mean the reader sees the top 32 bits of the old value and the bottom 32 bits of the new, which when combined can be a value that is neither the old nor the new.


The last example may be even more of a surprise.

Example 4. Consider the case where the variable named global may be changed by another thread.

int local = global; // Take a copy of the global

if (local == something) {

p1();

}

// Some non-trivial code that does not change global or local

if (local == something) {

p2();

}

Is it possible that p1() will be called without calling p2()? The answer is YES.


Here the reader is making a local copy of the racy variable and referring to that value twice. It is reasonable to expect that the value will be the same in both places, but again the optimizing compiler can generate code where that expectation is unmet. If local is assigned to a register, then it will have one value for the purposes of the first comparison, but if the code between the two conditionals is sufficiently non-trivial, then that register may get spilled; i.e. re-used for a different purpose. In that case, at the second conditional the value of local will be reloaded from the global variable into the register, by which time the writer may have changed it to a different value.

Note that in this case, if the variable named global is declared to be volatile, then the problem goes away because that tells the compiler to avoid that particular optimization. However, beware that the use of volatile can itself be problematic. See for example John Regehr and Eric Eide’s 2008 paper Volatiles are Miscompiled, and What to Do about it, in which they report having found compiler errors in thirteen different production compilers!

These are not isolated or contrived cases. See Boehm’s paper for a full explanation and several other compelling examples.

The lesson to be learned and repeatedly re-iterated is the following:

There is no such thing as a benign data race.

To achieve high-quality software for multi-core processors a zero-tolerance policy for data races is recommended. So how is the best way to find them?

Fun With Concurrency

Figure 1. A data race found by CodeSonar in the
singleton example.

When it comes to finding concurrency defects, traditional dynamic testing techniques may be inadequate. A program that passes a test a hundred times is not guaranteed to pass the next time, even with the same inputs and the same environment, because whether these bugs manifest or not is exquisitely sensitive to the timing, and the order in which operations in threads are interleaved is essentially non-deterministic.

Static analysis can be very helpful at finding data races. We recently introduced a data race detector in CodeSonar®, our static analysis tool. It finds data races by creating a map of which locks are held by which threads, and reasoning about the possible interleavings that could result in unsynchronized access to shared variables. Deadlock and other concurrency defects (including lock mismanagement) are found using similar techniques.

Figure 1 shows a screenshot from CodeSonar that demonstrates that it finds the data race in the singleton example from above. This program uses the Posix thread library, but CodeSonar knows about many other concurrency libraries too.


Interested in learning more? Read our guide on "Finding Concurrency Errors with GrammaTech Static Analysis" here:

Read the Guide