Lab 5 Solutions
These questions were written by Jerry Cain and Ryan Eberhardt.
Before the end of lab, be sure to fill out the lab checkoff sheet here!
Problem 1: Threading Short Answer Questions
Here are some short answer questions about the specifics of threads and the directives that help threads communicate.
- Threads are often called lightweight processes. In what sense are they
processes? And why the lightweight distinction?
- Each thread of execution runs as if it’s a miniature program. Threads
have their own stack, and they have access to globals, dynamic memory
allocation, global data, system calls, and so forth. They’re relatively
lightweight compared to true processes, because you don’t need to create
a new process when creating a thread. The thread manager cordons off a
portion of the full stack segment for the new thread’s stack, but
otherwise the thread piggybacks off existing text, heap, and data
segments previously establish by
fork
andexecvp
. (For more info, go ahead and read the man page forclone
).
- Each thread of execution runs as if it’s a miniature program. Threads
have their own stack, and they have access to globals, dynamic memory
allocation, global data, system calls, and so forth. They’re relatively
lightweight compared to true processes, because you don’t need to create
a new process when creating a thread. The thread manager cordons off a
portion of the full stack segment for the new thread’s stack, but
otherwise the thread piggybacks off existing text, heap, and data
segments previously establish by
- Threads running within the same process all share the same address space.
What are the advantages and disadvantages of allowing threads to access
pretty much all of virtual memory?
- It’s a double-edged sword. Because all threads have access to the same virtual address space, it’s relatively trivial to share information. However, because two or more threads might want to perform surgery on a shared piece of data, directives must be implanted into thread routines to ensure data and resources are shared without inspiring data corruption via race conditions, or deadlock via resource contention. That’s a very difficult thing to consistently get right.
- When might you choose multithreading over multiprocessing? When might you
choose multiprocessing over multithreading?
- Multithreading: easiest choice in most cases where you need to do multiple things concurrently. It’s easier to communicate between threads and they use less resources to execute.
- Multiprocessing: required if you want to run separate executables. (You can’t be running two executables in the same process, threads or no threads.) Also a good choice if you want to architect for security/robustness, since processes are self-contained and errors in one process won’t necessarily take down all other processes.
- Say two threads run
i--
on a single core machine. Is that safe? Why or why not?-
i--
is not thread safe if it expands to two or more assembly code instructions, as it does in x86_64. For a local variable not already in a register, the generated code might look like this:mov (%rbp), %rax sub $1, %rax mov %rax, (%rbp)
-
- What’s the difference between a
mutex
and asemaphore
with an initial value of 1? Can one be substituted for the other?- It would be functionally possible to replace a
mutex
withsemaphore(1)
, although I would not advise doing that. Mutexes are simpler and faster, so if you want to have at most one thread doing something, you should use a mutex. - A
mutex
is not a replacement for asemaphore
. First off, the thread that acquires thelock
on amutex
must be the one to unlock it (by C++ specification), whereas one thread cansignal
asemaphore
while a different threadwait
s on it. There’s also nothing to prevent asemaphore
initialized to 1 from being signaled to surround an even higher number (like 2, or 2 million), whereas amutex
can really only be in one of two states (locked or unlocked).
- It would be functionally possible to replace a
- What is the
lock_guard
class used for, and why is it useful?- It’s used to layer over some kind of mutex (typically a
mutex
, but possibly some other mutex classes liketimed_mutex
andrecursive_mutex
) so that it’s locked when it needs to be locked (via thelock_guard
constructor) and unlocked when thelock_guard
goes out of scope (via its destructor). It’s useful because it guarantees that a mutex that’s locked through alock_guard
will be unlocked no matter how the function, method, or enclosing scope ends. In particular, a function might have two or more exits paths (some early returns, some exceptions being thrown, and so forth).
- It’s used to layer over some kind of mutex (typically a
Problem 2: Debugging Myth Busters
The myth
cluster is composed of several individual computers, numbered
myth51
-myth66
. When you run ssh myth.stanford.edu
, a load balancer
evaluates how busy each machine is, and it directs you to the machine with the
least load. In order to do this, the load balancer needs to be able to
efficiently survey how busy each myth machine is.
We have implemented some starter code that sequentially logs into each myth
machine and counts the number of running processes. You can get the starter
code by cloning the lab5
folder:
$ git clone /usr/class/cs110/repos/lab5/shared lab5
$ cd lab5
$ make
We have provided a basic sequential (no threads) implementation in
myth-buster-sequential.cc
. The countMythProcesses
function builds a map
mapping myth numbers (e.g. 56 for myth56
) to the number of processes running
on that machine:
static const int kMinMythMachine = 51;
static const int kMaxMythMachine = 66;
static map<int, int> countMythProcesses() {
map<int, int> processCounts;
for (int mythNum = kMinMythMachine; mythNum <= kMaxMythMachine; mythNum++) {
// SSH into the myth machine and count the processes it's running
int numProcesses = getNumProcesses(mythNum);
// numProcesses may be zero if a myth machine is currently down/dead.
// Skip those.
if (numProcesses >= 0) {
cout << "myth" << mythNum << " has this many processes: " << numProcesses << endl;
processCounts[mythNum] = numProcesses;
}
}
return processCounts;
}
This code works, but is much too slow, taking 12.5 seconds to run:
🍉 time ./myth-buster-sequential
myth51 has this many processes: 368
myth52 has this many processes: 347
myth53 has this many processes: 318
myth54 has this many processes: 353
myth55 has this many processes: 366
myth56 has this many processes: 325
myth57 has this many processes: 401
myth58 has this many processes: 363
myth59 has this many processes: 420
myth60 has this many processes: 332
myth61 has this many processes: 367
myth62 has this many processes: 324
myth63 has this many processes: 355
myth64 has this many processes: 337
myth65 has this many processes: 355
myth66 has this many processes: 332
Machine with fewest processes: myth53
Number of processes on least loaded machine: 318
./myth-buster-sequential 0.30s user 0.17s system 3% cpu 12.445 total
We want to implement a parallel version of countMythProcesses
that uses
threads to speed up this process. You should use multiple threads so that you
can connect to multiple myth machines in parallel, with the following
requirements:
-
You should be connected to no more than 5 myth machines at a time. This limit is artificially low, since there aren’t many myth machines to begin with, but such restrictions are common in order to avoid overwhelming other computers (or yourself) with network requests, and you’ll see such a restriction when connecting to other websites in assignment 5. The final solution should take at least 1.5 seconds to run if this restriction is implemented correctly.
-
Your code should not have any possible data races or synchronization bugs. We have included a tool called ThreadSanitizer (sometimes called TSan) that observes what your program does and can warn you if your code has a data race. Like Valgrind, TSan cannot guarantee that your code is free of bugs, but it can catch some mistakes, and it does not report false positives.
While Valgrind injects extra code into your program at runtime, TSan needs to be compiled as part of your program. The Makefile builds an extra binary
myth-buster-threaded_tsan
that has TSan compiled into your program:🍉 time ./myth-buster-threaded_tsan myth55 has this many processes: 371 myth54 has this many processes: 371 myth52 has this many processes: 346 myth53 has this many processes: 317 myth51 has this many processes: 360 myth62 has this many processes: 318 myth57 has this many processes: 403 myth58 has this many processes: 364 myth64 has this many processes: 325 myth65 has this many processes: 348 myth59 has this many processes: 410 myth60 has this many processes: 333 myth61 has this many processes: 366 myth56 has this many processes: 325 myth66 has this many processes: 346 myth63 has this many processes: 363 Machine with fewest processes: myth53 Number of processes on least loaded machine: 317 ./myth-buster-threaded_tsan 0.29s user 0.15s system 16% cpu 2.695 total
If TSan detects a data race, it will print an error message with information about the memory being accessed by multiple threads, as well as the lines of code being executed by threads that are accessing the variable simultaneously.
We have implemented a buggy threaded countMythProcesses
in
myth-buster-threaded.cc
. Try running ./myth-buster-threaded
, observe the
broken behavior, and then try to fix each bug. Try to avoid figuring out the
bugs from simply reading and staring at the code. Instead, see if you can use
print statements or gdb to figure out what the program is doing incorrectly,
and then update the code to fix the issue. We want you to practice figuring out
what a program is doing (incorrectly) when it’s not immediately obvious what’s
wrong with the code, since this is a skill that will help you the most in your
assignments and in your own work after this class.
As an additional, optional exercise after you are finished debugging, see if
you can implement your own threaded version from scratch! You can cp myth-buster-sequential.cc myth-buster-threaded.cc
to replace our threaded
implementation and get a blank canvas for your own implementation.
There were six bugs. Here they are, listed in the order I think you would have found them if you were debugging based on observed symptoms (instead of reading/analyzing the code):
- The thread prints using
oslock
but fails to useosunlock
to let other threads print tocout
. - The thread
wait
s onconcurrentConnections
(taking a ball from the bucket so that it can connect to a myth machine), but it fails tosignal
when it is done. Eventually, the bucket is depleted of balls, and other threads’wait
calls hang. - The code
join
s each thread before moving on to create the next one, so this code executes sequentially. - Speaking of the semaphore, the
concurrentConnections.wait()
call happens after thegetNumProcesses(mythNum)
call, which connects to the myth machines. This means the semaphore does nothing towards limiting the number of active connections to myth machines. - The thread lambda function captures
mythNum
by reference instead of by value. As a result, we connect to the same myth machine several times, and don’t connect to other myth machines at all. - The threads update
processCounts
without synchronizing using a mutex. Most of the time, this will be fine, but if two threads update the data structure at the same time, we could see anything happen, including segfaults, silent memory corruption, or even arbitrary code execution. TSan should catch this data race.
Here is our solution with all bugs fixed:
static int countForMachine(int mythNum, semaphore &concurrentConnections) {
semaphore_guard sg(concurrentConnections);
return getNumProcesses(mythNum);
}
static map<int, int> countMythProcesses() {
map<int, int> processCounts;
vector<thread> threads;
mutex processCountsMutex;
semaphore concurrentConnections(5);
for (int mythNum = kMinMythMachine; mythNum <= kMaxMythMachine; mythNum++) {
threads.push_back(thread([mythNum, &processCounts, &processCountsMutex, &concurrentConnections](){
int numProcesses = countForMachine(mythNum, concurrentConnections);
// numProcesses may be zero if a myth machine is currently down/dead.
// Skip those.
if (numProcesses >= 0) {
cout << oslock << "myth" << mythNum << " has this many processes: " << numProcesses
<< endl << osunlock;
lock_guard<mutex> lg(processCountsMutex);
processCounts[mythNum] = numProcesses;
}
}));
}
for (thread &t : threads) {
t.join();
}
return processCounts;
}
Questions for discussion, after the bugs are fixed
- How do you decide how many threads to have running at a time? Is more threads
always better?
- It depends on the situation. If your threads are doing something that requires a lot of CPU time, then numThreads should equal numCpuCores. Having one thread per CPU core would be optimal, since each thread should be using 100% of the CPU core. Adding more threads would simply introduce more scheduling overhead. However, if your threads are I/O bound (doing very little on the CPU, spending most of the time waiting for user input or the filesystem or the network), then you can have many more threads than CPUs without using all of your CPU resources, and doing so might allow you to do much more work in parallel.
- When should a variable be passed by reference? When should it be passed by
value?
- A variable should be passed by reference when you want threads to share the same, single variable. (If the variable is being modified, accesses should be synchronized using a mutex.) It should be passed by value when you want threads to have separate copies, or when you want to “give” a variable to a thread. (If you pass a reference to a variable on your stack when invoking a thread, and then you return from the current function, that would delete the stack frame and free the variable’s memory, causing a use-after-free if the other thread continues using the variable.)
- When should the threads be joined?
- In this program, threads should be joined after all the threads are created. Joining too early serializes the code, causing the myth connections to run sequentially.
- When should the mutex be locked? When should it be unlocked? How would the
program behave if you locked/unlocked in different places?
- The mutex should be locked immediately before updating
processCounts
, and unlocked immediately afterwards. Locking before is unnecessary but may not cause problems, as long as we don’t hold the lock while callinggetNumProcesses(mythNum)
, as doing so would serialize the downloads.
- The mutex should be locked immediately before updating
- When should the semaphore be waited on? When should it be signaled? How would
the program behave if you waited/signaled in different places?
- The semaphore should be waited on before calling
getNumProcesses
, and should be signaled immediately afterwards. It would be fine to wait on the semaphore before creating the thread, and this would actually be preferable if there were hundreds of myth machines to connect to; by doing this, we avoid creating lots of threads that sit around idly doing nothing for a long time until they can get a ball from the bucket. The semaphore should be signaled immediately aftergetNumProcesses(mythNum)
returns. Doing this later wouldn’t break anything, but it’s not ideal, since it’s best to let other threads start connecting to other myth machines as early as possible.
- The semaphore should be waited on before calling
- What debugging tricks did you find most helpful in the course of fixing this
code?
- Let us know!