Lab 5: Threads, Mutexes, and Semaphores

These questions were written by Jerry Cain and Ryan Eberhardt.

Before the end of lab, be sure to fill out the lab checkoff sheet here!

Problem 1: Threading Short Answer Questions

Here are some short answer questions about the specifics of threads and the directives that help threads communicate.

Problem 2: Debugging Myth Busters

The myth cluster is composed of several individual computers, numbered myth51-myth66. When you run ssh myth.stanford.edu, a load balancer evaluates how busy each machine is, and it directs you to the machine with the least load. In order to do this, the load balancer needs to be able to efficiently survey how busy each myth machine is.

We have implemented some starter code that sequentially logs into each myth machine and counts the number of running processes. You can get the starter code by cloning the lab5 folder:

$ git clone /usr/class/cs110/repos/lab5/shared lab5
$ cd lab5
$ make

We have provided a basic sequential (no threads) implementation in myth-buster-sequential.cc. The countMythProcesses function builds a map mapping myth numbers (e.g. 56 for myth56) to the number of processes running on that machine:

static const int kMinMythMachine = 51;
static const int kMaxMythMachine = 66;
static map<int, int> countMythProcesses() {
    map<int, int> processCounts;
    for (int mythNum = kMinMythMachine; mythNum <= kMaxMythMachine; mythNum++) {
        // SSH into the myth machine and count the processes it's running
        int numProcesses = getNumProcesses(mythNum);
        // numProcesses may be zero if a myth machine is currently down/dead.
        // Skip those.
        if (numProcesses >= 0) {
            cout << "myth" << mythNum << " has this many processes: " << numProcesses << endl;
            processCounts[mythNum] = numProcesses;
        }
    }
    return processCounts;
}

This code works, but is much too slow, taking 12.5 seconds to run:

🍉 time ./myth-buster-sequential
myth51 has this many processes: 368
myth52 has this many processes: 347
myth53 has this many processes: 318
myth54 has this many processes: 353
myth55 has this many processes: 366
myth56 has this many processes: 325
myth57 has this many processes: 401
myth58 has this many processes: 363
myth59 has this many processes: 420
myth60 has this many processes: 332
myth61 has this many processes: 367
myth62 has this many processes: 324
myth63 has this many processes: 355
myth64 has this many processes: 337
myth65 has this many processes: 355
myth66 has this many processes: 332
Machine with fewest processes: myth53
Number of processes on least loaded machine: 318
./myth-buster-sequential  0.30s user 0.17s system 3% cpu 12.445 total

We want to implement a parallel version of countMythProcesses that uses threads to speed up this process. You should use multiple threads so that you can connect to multiple myth machines in parallel, with the following requirements:

We have implemented a buggy threaded countMythProcesses in myth-buster-threaded.cc. Try running ./myth-buster-threaded, observe the broken behavior, and then try to fix each bug. Try to avoid figuring out the bugs from simply reading and staring at the code. Instead, see if you can use print statements or gdb to figure out what the program is doing incorrectly, and then update the code to fix the issue. We want you to practice figuring out what a program is doing (incorrectly) when it’s not immediately obvious what’s wrong with the code, since this is a skill that will help you the most in your assignments and in your own work after this class.

As an additional, optional exercise after you are finished debugging, see if you can implement your own threaded version from scratch! You can cp myth-buster-sequential.cc myth-buster-threaded.cc to replace our threaded implementation and get a blank canvas for your own implementation.

Debugging with threads

There are usually three types of things that go wrong in a multithreaded program: deadlock/hangs, crashes, and incorrect behavior/output.

Debugging deadlock and crashes

I think the fastest way to debug deadlock or crashing is to let the program run under gdb, and then when the deadlock or crash happens, use gdb to get a backtrace telling you where the problem is happening.

For example, our myth-buster-threaded program hangs right off the bat. Where is it getting stuck? We can find out:

🍉 gdb ./myth-buster-threaded
(gdb) r
Starting program: /afs/ir.stanford.edu/class/archive/cs/cs110/cs110.1218/staff/master_repos/lab5/starter/myth-buster-threaded
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7a42700 (LWP 654182)]
[Detaching after vfork from child process 654183]
myth51 has this many processes: 353
... more output omitted ...
[New Thread 0x7ffff7a42700 (LWP 654203)]
[Detaching after vfork from child process 654204]
..... program appears to be stuck doing nothing .....

Once the program has hung, we can press ctrl+c to pause it, then see where the main thread is stuck:

(gdb) back
#0  __pthread_clockjoin_ex (threadid=140737348118272, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>,
    block=<optimized out>) at pthread_join_common.c:145
#1  0x00007ffff7e9e047 in std::thread::join() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x000000000040257e in countMythProcesses () at myth-buster-threaded.cc:43
#3  0x0000000000402496 in main (argc=1, argv=0x7fffffffe0f8) at myth-buster-threaded.cc:60

The main thread is waiting on thread::join, i.e. waiting for a different thread to exit. That doesn’t tell us too much. But we can see where other threads are stuck:

(gdb) info threads
  Id   Target Id                                            Frame
* 1    Thread 0x7ffff7a43740 (LWP 654178) "myth-buster-thr" __pthread_clockjoin_ex (threadid=140737348118272, thread_return=0x0,
    clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>) at pthread_join_common.c:145
  7    Thread 0x7ffff7a42700 (LWP 654203) "myth-buster-thr" futex_wait_cancelable (private=<optimized out>, expected=0,
    futex_word=0x7fffffffdf68) at ../sysdeps/nptl/futex-internal.h:183
(gdb) thread 7
[Switching to thread 7 (Thread 0x7ffff7a42700 (LWP 654203))]
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffffffdf68) at ../sysdeps/nptl/futex-internal.h:183
183	../sysdeps/nptl/futex-internal.h: No such file or directory.
(gdb) back
#0  futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7fffffffdf68) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x41eec0, cond=0x7fffffffdf40) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x7fffffffdf40, mutex=0x41eec0) at pthread_cond_wait.c:638
#3  0x00007ffff7e97e30 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00000000004060a7 in std::_V2::condition_variable_any::wait<std::mutex> (this=0x7fffffffdf40, __lock=...)
    at /usr/include/c++/5/condition_variable:250
#5  0x0000000000405948 in std::_V2::condition_variable_any::wait<std::mutex, semaphore::wait()::<lambda()> >(std::mutex &, semaphore::<lambda()>) (this=0x7fffffffdf40, __lock=..., __p=...) at /usr/include/c++/5/condition_variable:259
#6  0x00000000004056e0 in semaphore::wait (this=0x7fffffffdf10) at semaphore.cc:21
#7  0x0000000000402b3e in countMythProcesses()::$_1::operator()() const (this=0x41eef8) at myth-buster-threaded.cc:35
#8  0x0000000000402afd in std::__invoke_impl<void, countMythProcesses()::$_1>(std::__invoke_other, countMythProcesses()::$_1&&) (
    __f=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60
#9  0x0000000000402a8d in std::__invoke<countMythProcesses()::$_1>(countMythProcesses()::$_1&&) (__fn=...)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:95
#10 0x0000000000402a65 in std::thread::_Invoker<std::tuple<countMythProcesses()::$_1> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (
    this=0x41eef8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/thread:264
#11 0x0000000000402a35 in std::thread::_Invoker<std::tuple<countMythProcesses()::$_1> >::operator()() (this=0x41eef8)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/thread:271
#12 0x000000000040297e in std::thread::_State_impl<std::thread::_Invoker<std::tuple<countMythProcesses()::$_1> > >::_M_run() (
    this=0x41eef0) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/thread:215
#13 0x00007ffff7e9dde4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x00007ffff7c43609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#15 0x00007ffff7b6a293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

There is a lot of output here, but if we sift through the backtrace and look at frame #6, we can see that this thread appears to be stuck on a semaphore::wait call! At this point, we could add additional print statements around any semaphore::wait or semaphore::signal calls to get a better sense of how the semaphore is being (mis)used and why this hang might be happening.

Debugging incorrect behavior/output

While you can certainly step through threaded code line-by-line in gdb, I personally find it easier to use print statements to debug. Gdb can change the timing of execution, and if you have race conditions that are dependent on certain scheduling patterns, you might not be able to reproduce these bugs under gdb. Also, constantly switching between threads can make it hard for you to follow what is happening in the program.

What should you print? Anything that might make it easier to understand what your program is doing! The caveat is that you should write descriptive print statements, so that it’s easy for you to sift through a wall of debugging output. Outputs like getNumProcesses and wait will be hard to interpret. Outputs like Thread calling getNumProcesses for myth56... and myth56 thread calling concurrentConnections.wait()... may be easier to understand.

Questions for discussion, after the bugs are fixed