Lecture 11: Intro to Multithreading
Note: Reading these lecture notes is not a substitute for watching the lecture. I frequently go off script, and you are responsible for understanding everything I talk about in lecture unless I specify otherwise.
What are threads?
So far, we have been talking about processes. Each process has its own virtual address space, its own file descriptor table, its own signal table and signal mask, etc. Each process can use registers on the CPU without other processes changing its values.
Threads are created within processes. A thread is very much like a process (in fact, threads are often called “lightweight processes,") but instead of being isolated, threads share most resources with other threads of the same process. You can imagine multiple people living in the same apartment: they can all access each others’s stuff, but they run separately, doing their own things.
Launching threads
In C++, to spawn a thread, create a thread
object. The first argument is the
function that the thread should run. This example launches 1 extra thread that
prints “hello world”:
Notice that the whole program takes only 1 second total to run, even though
there are 2 seconds worth of sleep
calls. The threads are running
concurrently, and everything you know about process scheduling carries over
here.
thread.join()
is similar to waitpid; it blocks until the thread exits.
Unlike waitpid, your program will crash if you forget to call join()! If
you see the error terminate called without an active exception
, you likely
forgot to join a thread before its thread
object went out of scope.
Our first multithreading concurrency issues
printf
has some internal synchronization so that if multiple threads are
printing at the same time, the whole message gets printed without being
interrupted by any other threads. This ensures that any output on the terminal
is coherent, and doesn’t have characters interleaved from two running processes
or threads.
We don’t have the same guarantees from cout
, which prints segments between
<<
atomically, but not the entire message. If you run twenty “hello world”
threads, you’ll likely see some interleaving of output:
We have written two stream manipulators called oslock
and osunlock
that
bring atomicity (all-or-nothing) guarantees to cout
. For now, you can just
use them without understanding how they work (though I will explain how they
work next week). The following printHello
does not have the same interleaving
problems:
static void printHello(size_t threadNum) {
cout << oslock << "Hello world from thread " << threadNum << "!" << endl << osunlock;
}
Note: endl
should come before osunlock
, since endl
is an actual
character (\n
) that is being printed to the screen, and osunlock
says
“okay, I’m done printing – let other threads print now!”
Note: If you forget osunlock
, your program will probably hang, because no
other thread (yourself included) will be able to get past oslock
until
osunlock
gives the all-clear message!
Beware passing by reference
This code is roughly the same as before, but it does the printing in a lambda function:
Unfortunately, this output is all sorts of broken! We see the same number printed multiple times, and some numbers are missing from the output.
This is because the lambda function is capturing i
by reference, so each
thread is receiving a reference (essentially a pointer) to the variable i
in
the main thread’s stack. The main thread is incrementing this variable on each
iteration of the while
loop, so by the time each child thread starts running,
it may not see the value of i
that we intended for it to see.
We can fix this by capturing by value instead, which copies i
into each
thread’s stack:
threads[i] = thread([i](){
Note the lack of &
.
Race conditions
Consider the following code, where 10 ticket agents collaboratively sell off 100 tickets:
If one of the threads gets pulled off the processor after the
if (remainingTickets == 0)
test but before remainingTickets--
, then a race
condition can occur: the agent thinks there are still tickets available and
goes to sell it, but just before it sells the last ticket, a different agent
sells it. Then this agent gets put back on the processor, executes
remainingTickets--
despite that now being 0, and remainingTickets
underflows. If you run this code with sleep_for(100)
in between those two
lines, you’ll see a very large number of tickets being sold.
Mutexes to the rescue
A mutex (short for “mutual exclusion”) is a synchronization primitive that can prevent two threads from running critical code at the same time. By protecting a critical region with a mutex, we can mutually exclude threads from executing code inside of that region at the same time, avoiding the race conditions described above.
Mutexes can either be locked or unlocked; they are initialized to be in the
unlocked state. A thread can call lock()
on the mutex, and if it’s unlocked,
lock()
will lock the lock and return immediately. Otherwise, it will wait for
the lock to become unlocked.
You can think of a mutex as a magic key, where only one thread can have the key
at a time. The lock()
function says, “You need to be holding the key before
you can pass this point!”
We can fix the ticket agent code by declaring a mutex in main
:
mutex remainingTicketsLock;
and passing it to each thread:
static void ticketAgent(size_t id, size_t &remainingTickets, mutex &remainingTicketsLock) {
...
}
int main() {
thread agents[10];
size_t remainingTickets = 250;
mutex remainingTicketsLock;
for (size_t i = 0; i < 10; i++) {
agents[i] = thread([&remainingTickets, &remainingTicketsLock, i](){
ticketAgent(i + 100, remainingTickets, remainingTicketsLock);
});
}
}
Then, we use the mutex to ensure that no other thread reads or updates
remainingTickets
while we are working with that value:
remainingTicketsLock.lock();
if (remainingTickets == 0) {
remainingTicketsLock.unlock();
break;
}
remainingTickets--;
remainingTicketsLock.unlock();
Lock guards
A lock guard is a data type that locks a lock in its constructor and unlocks it
in its destructor. This is extremely helpful for automatically unlocking a lock
when the lock guard goes out of scope, instead of needing to worry about
unlocking the lock before every break
, return
, or throw
. In the ticket
agents example, we could remove every remainingTicketsLock.unlock()
call, and
replace every lock()
call with the following:
lock_guard<mutex> lg(remainingTicketsLock);
More subtle race conditions
Race conditions can happen within a single line of code as well. Consider this program:
int main(int argc, const char *argv[]) {
int counter = 0;
thread thread1 = thread([&] () {
counter++;
});
thread thread2 = thread([&] () {
counter++;
});
thread1.join();
thread2.join();
cout << "counter = " << counter << endl;
return 0;
}
Almost always, this program prints counter = 2
. However, even though
counter++
looks like a single uninterruptible line, it expands to three
assembly instructions:
mov 0x12345600, %rax
inc %rax
mov %rax, 0x12345600
It’s possible that Thread 1 loads 0
into rax
before getting pulled off the
processor. Thread 2 loads 0
into rax
, increments it to 1
, and writes 1
back to memory. Then, Thread 1 wakes back up, increments its rax
to 1
, and
writes that back to memory. The final value that is printed is counter = 1
,
which is not what we’d expect.
When do I need a mutex?
A data race happens when:
- You are touching some data in multiple places at the same time
- At least one of those accessses is a modification. (It doesn’t matter if you’re reading a value that isn’t changing – that’s no problem.)
To prevent this, you should use a mutex to make sure that at most one thread can be touching the data at the same time.