Lab 4: Assignment 4 Redux, Multithreading

This handout was adapted from Jerry Cain’s Spring 2018 offering.

I’ve created a Slack channel for Lab 4 discussion (aptly named #lab4), and all students (but particularly remote students) are encouraged to share their ideas there.

Problem 1: Assignment 2 Revisited

Here are a collection of short answer questions drilling your understanding of subprocess and farm. It’s not uncommon for students to get working solutions to assignments and still not be entirely clear why they work. These questions are here to force you to think big picture and understand the systems concepts I feel are important.

Your Assignment 2 implementation of subprocess required two pipes – one to foster a parent-to-child communication channel, and a second to foster a child-to-parent communication channel. Clearly explain why a single pipe shouldn’t be used to support both communication channels.
You’ve seen dprintf in lecture and in the assign3 handout, and it presumably contributed to many farm implementations. Explain why there’s a dprintf function, but there’s no analogous dscanf function. Hint: Think about why dprintf(fd, "%s %d\n", str, i) would be easy to manage whereas dscanf(fd, "%s %d\n", str, &i) wouldn’t be. Read the first few lines of the man pages for the traditional fprintf and fscanf functions to understand how they operate.

Consider the implementation of spawnAllWorkers below. Even though it rarely causes problems, the highlighted line below technically contains a race condition. Briefly describe the race condition, and explain how to fix it.

    struct worker {
           worker() {}
           worker(char *argv[], const set<int>& openfds = set<int>()) : sp(subprocess(argv, true, false, openfds)), available(false) {}
           subprocess_t sp;
           bool available;
        };
        
        static const size_t kNumCPUs = sysconf(_SC_NPROCESSORS_ONLN);
        static vector<worker> workers(kNumCPUs);
        static size_t numWorkersAvailable;
        
        static const char *kWorkerArguments[] = {
           "./factor.py", "--self-halting", NULL
        };
        static void spawnAllWorkers() {
           cout << "There are this many CPUs: " << kNumCPUs << ", numbered 0 through " 
                << kNumCPUs - 1 << "." << endl;
           set<int> openfds;
           for (size_t i = 0; i < workers.size(); i++) {
              workers[i] = worker(const_cast<char **>(kWorkerArguments, openfds));
              openfds.insert(workers[i].supplyfd);
              assignToCPU(workers[i].sp.pid , i); // implementation omitted, irrelevant
           }
        }
        
        int main(int argc, char *argv[]) {
            signal(SIGCHLD, markWorkersAsAvailable); // markWorkersAsAvailable is correct
            spawnAllWorkers();
            // other functions, assume all correct
            return 0;
        }

While implementing the farm program for assign2, you were expected to implement a getAvailableWorker function to effectively block farm until at least one worker was available. My own solution relied on a helper function I called waitForAvailableWorker, which I present below. After analyzing my own solution, answer the following questions:
- Assuming no signals are blocked at the time waitForAvailableWorker is called, clearly identify when SIGCHLD is blocked and when it is not.
- Had I accidentally passed in &additions to the sigsuspend call instead of &existing, the farm could have deadlocked. Explain why.
- Had I accidentally omitted the sigaddset call and not blocked SIGCHLD, farm could have deadlocked. Explain how.
- In past quarters, I saw a bunch of students who lifted the block on SIGCHLD before the two lines in bold instead of after. As it turns out, executing numWorkersAvailable-- after the block is lifted can cause problems, but executing workers[i].available = false actually can’t. Explain why the placement of the -- is more sensitive to race conditions than the Boolean assignment is.
```
static sigset_t waitForAvailableWorker() {
   sigset_t existing, additions;
   sigemptyset(&additions);
   sigaddset(&additions, SIGCHLD);
   sigprocmask(SIG_BLOCK, &additions, &existing);
   while (numWorkersAvailable == 0) sigsuspend(&existing);
   return existing;
}
    
static size_t getAvailableWorker() {
   sigset_t existing = waitForAvailableWorker();
   size_t i;
   for (i = 0; !workers[i].available; i++);
   assert(i < workers.size());
   **numWorkersAvailable**--;
   workers[i].available = false;**
   sigprocmask(SIG_SETMASK, &existing, NULL); // restore original block set
   return i;
}
```

Problem 2: Threading Short Answer Questions

Here are some more short answer questions about the specifics of threads and the directives that help threads communicate.

Is i-- thread safe on a single core machine? Why or why not?
What’s the difference between a mutex and a semaphore with an initial value of 1? Can one be substituted for the other?
What is the lock_guard class used for, and why is it useful?
What is busy waiting? Is it ever a good idea? Does your answer to the good-idea question depend on the number of CPUs your multithreaded application has access to?
As it turns out, the semaphore’s constructor allows a negative number to be passed in, as with semaphore s(-11). Identify a scenario where -11 might be a sensible initial value.
What would the implementation of semaphore::signal(size_t increase = 1) need to look like if we wanted to allow a semaphore’s encapsulated value to be promoted by the increase amount? Note that increase defaults to 1, so that this version could just replace the standard semaphore::signal that’s officially exported by the semaphore abstraction.
What’s the multiprocessing equivalent of the mutex?
What’s the multiprocessing equivalent of the condition_variable_any?
The semaphore implementation we produced in lecture is repeated below. Many students have asked whether the very last line of semaphore::signal could have called notify_one instead of notify_all. Does is matter? Explain.

semaphore::semaphore(int value) : value(value) {}

void semaphore::wait() {
  lock_guard<mutex> lg(m);
  cv.wait(m, [this]{ return value > 0; });
  value--;
}

void semaphore::signal() {
  lock_guard<mutex> lg(m);
  value++;
  if (value == 1) cv.notify_all();
}

Problem 3: Threads vs Processes

To provide some answers about the pros and cons of threads and processes, we’ll lead you through a collection of short answer questions about multiprocessing and multithreading that focuses more on the big picture.

What does it mean when we say that a process has a private address space?
What are the advantages of a private address space?
What are the disadvantages?
What programming directives have we used in prior discussion section handouts to circumvent address space privacy?
When architecting a larger program like farm or stsh that relies on multiprocessing, what did we need to do to exchange information across process boundaries?
Can a process be used to execute multiple executables? Restated, can it execvp twice to run multiple programs?
Threads are often called lightweight processes. In what sense are they processes? And why the lightweight distinction?
Threads are often called virtual processes as well. In what sense are threads an example of virtualization?
Threads running within the same process all share the same address space. What are the advantages and disadvantages of allowing threads to access pretty much all of virtual memory?

Each thread within a larger process is given a thread id, also called a tid. In fact, the thread id concept is just an extension of the process id. For singly threaded processes, the pid and the main thread’s tid are precisely the same. If a process with pid 12345 creates three additional threads beyond the main thread (and no other processes or threads within other processes are created), then the tid of the main thread would be 12345, and the thread ids of the three other threads would be 12346, 12347, and 12348.

What are the advantages of leveraging the pid abstraction for thread ids?
What happens if you pass a thread id that isn’t a process id to waitpid?
What happens if you pass a thread id to sched_setaffinity?
What are the advantages of requiring that a thread always be assigned to the same CPU?

In some situations, the decision to rely on multithreading instead of multiprocessing is dictated solely by whether the code to be run apart from the main thread is available in executable or in library form. But in other situations, we have a choice.

Why might you prefer multithreading over multiprocessing if both are reasonably good options?
Why might you prefer multiprocessing over multithreading if both are reasonably good options?
What happens if a thread within a larger process calls fork?
What happens if a thread within a larger process calls execvp?

And some final questions about how farm and stsh could be implemented:

Assume you know nothing about multiprocessing but needed to emulate the functionality of farm. How could you use multithreading to design an equally parallel solution without ever relying on multiprocessing?
Could multithreading contribute to the implementation of stsh in any meaningful way? Why or why not?

CS 110

Lab 4: Assignment 4 Redux, Multithreading

Problem 1: Assignment 2 Revisited

Problem 2: Threading Short Answer Questions

Problem 3: Threads vs Processes