Lecture 11: Intro to Multithreading

Note: Reading these lecture notes is not a substitute for watching the lecture. I frequently go off script, and you are responsible for understanding everything I talk about in lecture unless I specify otherwise.

What are threads?

So far, we have been talking about processes. Each process has its own virtual address space, its own file descriptor table, its own signal table and signal mask, etc. Each process can use registers on the CPU without other processes changing its values.

Threads are created within processes. A thread is very much like a process (in fact, threads are often called “lightweight processes,") but instead of being isolated, threads share most resources with other threads of the same process. You can imagine multiple people living in the same apartment: they can all access each others’s stuff, but they run separately, doing their own things.

Launching threads

In C++, to spawn a thread, create a thread object. The first argument is the function that the thread should run. This example launches 1 extra thread that prints “hello world”:

Notice that the whole program takes only 1 second total to run, even though there are 2 seconds worth of sleep calls. The threads are running concurrently, and everything you know about process scheduling carries over here.

thread.join() is similar to waitpid; it blocks until the thread exits. Unlike waitpid, your program will crash if you forget to call join()! If you see the error terminate called without an active exception, you likely forgot to join a thread before its thread object went out of scope.

Our first multithreading concurrency issues

printf has some internal synchronization so that if multiple threads are printing at the same time, the whole message gets printed without being interrupted by any other threads. This ensures that any output on the terminal is coherent, and doesn’t have characters interleaved from two running processes or threads.

We don’t have the same guarantees from cout, which prints segments between << atomically, but not the entire message. If you run twenty “hello world” threads, you’ll likely see some interleaving of output:

We have written two stream manipulators called oslock and osunlock that bring atomicity (all-or-nothing) guarantees to cout. For now, you can just use them without understanding how they work (though I will explain how they work next week). The following printHello does not have the same interleaving problems:

static void printHello(size_t threadNum) {
    cout << oslock << "Hello world from thread " << threadNum << "!" << endl << osunlock;
}

Note: endl should come before osunlock, since endl is an actual character (\n) that is being printed to the screen, and osunlock says “okay, I’m done printing – let other threads print now!”

Note: If you forget osunlock, your program will probably hang, because no other thread (yourself included) will be able to get past oslock until osunlock gives the all-clear message!

Beware passing by reference

This code is roughly the same as before, but it does the printing in a lambda function:

Unfortunately, this output is all sorts of broken! We see the same number printed multiple times, and some numbers are missing from the output.

This is because the lambda function is capturing i by reference, so each thread is receiving a reference (essentially a pointer) to the variable i in the main thread’s stack. The main thread is incrementing this variable on each iteration of the while loop, so by the time each child thread starts running, it may not see the value of i that we intended for it to see.

We can fix this by capturing by value instead, which copies i into each thread’s stack:

threads[i] = thread([i](){

Note the lack of &.

Race conditions

Consider the following code, where 10 ticket agents collaboratively sell off 100 tickets:

If one of the threads gets pulled off the processor after the if (remainingTickets == 0) test but before remainingTickets--, then a race condition can occur: the agent thinks there are still tickets available and goes to sell it, but just before it sells the last ticket, a different agent sells it. Then this agent gets put back on the processor, executes remainingTickets-- despite that now being 0, and remainingTickets underflows. If you run this code with sleep_for(100) in between those two lines, you’ll see a very large number of tickets being sold.

Mutexes to the rescue

A mutex (short for “mutual exclusion”) is a synchronization primitive that can prevent two threads from running critical code at the same time. By protecting a critical region with a mutex, we can mutually exclude threads from executing code inside of that region at the same time, avoiding the race conditions described above.

Mutexes can either be locked or unlocked; they are initialized to be in the unlocked state. A thread can call lock() on the mutex, and if it’s unlocked, lock() will lock the lock and return immediately. Otherwise, it will wait for the lock to become unlocked.

You can think of a mutex as a magic key, where only one thread can have the key at a time. The lock() function says, “You need to be holding the key before you can pass this point!”

We can fix the ticket agent code by declaring a mutex in main:

mutex remainingTicketsLock;

and passing it to each thread:

static void ticketAgent(size_t id, size_t &remainingTickets, mutex &remainingTicketsLock) {
    ...
}

int main() {
    thread agents[10];
    size_t remainingTickets = 250;
    mutex remainingTicketsLock;
    for (size_t i = 0; i < 10; i++) {
        agents[i] = thread([&remainingTickets, &remainingTicketsLock, i](){
            ticketAgent(i + 100, remainingTickets, remainingTicketsLock);
        });
    }
}

Then, we use the mutex to ensure that no other thread reads or updates remainingTickets while we are working with that value:

remainingTicketsLock.lock();
if (remainingTickets == 0) {
    remainingTicketsLock.unlock();
    break;
}
remainingTickets--;
remainingTicketsLock.unlock();

Lock guards

A lock guard is a data type that locks a lock in its constructor and unlocks it in its destructor. This is extremely helpful for automatically unlocking a lock when the lock guard goes out of scope, instead of needing to worry about unlocking the lock before every break, return, or throw. In the ticket agents example, we could remove every remainingTicketsLock.unlock() call, and replace every lock() call with the following:

lock_guard<mutex> lg(remainingTicketsLock);

More subtle race conditions

Race conditions can happen within a single line of code as well. Consider this program:

int main(int argc, const char *argv[]) {
    int counter = 0;

    thread thread1 = thread([&] () {
        counter++;
    });
    thread thread2 = thread([&] () {
        counter++;
    });

    thread1.join();
    thread2.join();

    cout << "counter = " << counter << endl;
    return 0;
}

Almost always, this program prints counter = 2. However, even though counter++ looks like a single uninterruptible line, it expands to three assembly instructions:

mov 0x12345600, %rax
inc %rax
mov %rax, 0x12345600

It’s possible that Thread 1 loads 0 into rax before getting pulled off the processor. Thread 2 loads 0 into rax, increments it to 1, and writes 1 back to memory. Then, Thread 1 wakes back up, increments its rax to 1, and writes that back to memory. The final value that is printed is counter = 1, which is not what we’d expect.

When do I need a mutex?

A data race happens when:

To prevent this, you should use a mutex to make sure that at most one thread can be touching the data at the same time.