Lab 4 Solutions
This handout was adapted from Jerry Cain’s Spring 2018 offering.
I’ve created a Slack channel for Lab 4 discussion (aptly named #lab4), and all students (but particularly remote students) are encouraged to share their ideas there.
Problem 1: Assignment 2 Revisited
Here are a collection of short answer questions drilling your understanding of
subprocess and farm. It’s not uncommon for students to get working
solutions to assignments and still not be entirely clear why they work. These
questions are here to force you to think big picture and understand the systems
concepts I feel are important.
- Your Assignment 2 implementation of
subprocessrequired two pipes – one to foster a parent-to-child communication channel, and a second to foster a child-to-parent communication channel. Clearly explain why a single pipe shouldn’t be used to support both communication channels.- Because both child and parent would need to write to
fds[1], and there’d be no generic way to codify whether material in the pipe is intended for child or parent, so one could accidentally read what’s intended for the other.
- Because both child and parent would need to write to
- You’ve seen
dprintfin lecture and in theassign3handout, and it presumably contributed to manyfarmimplementations. Explain why there’s adprintffunction, but there’s no analogousdscanffunction. Hint: Think about whydprintf(fd, "%s %d\n", str, i)would be easy to manage whereasdscanf(fd, "%s %d\n", str, &i)wouldn’t be. Read the first few lines of themanpages for the traditionalfprintfandfscanffunctions to understand how they operate.- With
dprintf, the entire string can be constructed (and its length computed) before the underlyingwritecalls.dscanfmight need to read an extra character (e.g. the space in"1234 ") to confirm a placeholder like%dhas been matched, and there’s no way to revert that extra read.
- With
Consider the implementation of
spawnAllWorkersbelow. Even though it rarely causes problems, the highlighted line below technically contains a race condition. Briefly describe the race condition, and explain how to fix it.struct worker { worker() {} worker(char *argv[], const set<int>& openfds = set<int>()) : sp(subprocess(argv, true, false, openfds)), available(false) {} subprocess_t sp; bool available; }; static const size_t kNumCPUs = sysconf(_SC_NPROCESSORS_ONLN); static vector<worker> workers(kNumCPUs); static size_t numWorkersAvailable; static const char *kWorkerArguments[] = { "./factor.py", "--self-halting", NULL }; static void spawnAllWorkers() { cout << "There are this many CPUs: " << kNumCPUs << ", numbered 0 through " << kNumCPUs - 1 << "." << endl; set<int> openfds; for (size_t i = 0; i < workers.size(); i++) { workers[i] = worker(const_cast<char **>(kWorkerArguments, openfds)); openfds.insert(workers[i].supplyfd); assignToCPU(workers[i].sp.pid , i); // implementation omitted, irrelevant } } int main(int argc, char *argv[]) { signal(SIGCHLD, markWorkersAsAvailable); // markWorkersAsAvailable is correct spawnAllWorkers(); // other functions, assume all correct return 0; }- The right hand side constructs a temporary that launches a process that self-halts before content of temporary is copied into
workers[i]. If that happens, theSIGCHLDhandler is invoked and crawls over an array that doesn’t have the pid yet, and sadness ensues. - The solution is to block
SIGCHLDbefore the highlighted line and then unblock after.
- The right hand side constructs a temporary that launches a process that self-halts before content of temporary is copied into
While implementing the
farmprogram forassign2, you were expected to implement agetAvailableWorkerfunction to effectively blockfarmuntil at least one worker was available. My own solution relied on a helper function I calledwaitForAvailableWorker, which I present below. After analyzing my own solution, answer the following questions:- Assuming no signals are blocked at the time
waitForAvailableWorkeris called, clearly identify whenSIGCHLDis blocked and when it is not.- TL;DR: SIGCHLD isn’t blocked prior to the
sigprocmaskcall, within the call tosigsuspend. It’s blocked everywhere else. - More detail: The first three lines create a singleton mask containing
just
SIGCHLD, and the fourth line blocks it. Thewhileloop test is evaluated while theSIGCHLDblock is in place, which is exactly what we want, because we don’t want its evaluation to be interrupted by the execution ofmarkWorkersAsAvailable. If the test passes, the callsigsuspendsimultaneously lifts the block onSIGCHLD(remember:existingmaskis empty) and puts the process to sleep until some (or rather, any) signal is received, at which point any installed handler (presumably theSIGCHLDhandler, since we expectSIGCHLDto be the signal that comes in) executes with high priority. After any handler exits,sigsuspendreturns while simultaneously restoring the mask that was in place beforesigsuspendwas called, and re-evaluates the test again with a block onSIGCHLD. That process repeats until thewhileloop test fails, at which pointwaitForAvailableWorkerreturns, leaving the block onSIGCHLDin place.
- TL;DR: SIGCHLD isn’t blocked prior to the
- Had I accidentally passed in
&additionsto thesigsuspendcall instead of&existing, the farm could have deadlocked. Explain why.numWorkersAvailable == 0could pass, and thensigsuspendwould forcefarmto deadlock, as onlySIGCHLDsignals are coming in and capable of changingnumWorkersAvailable, and they’re blocked.
- Had I accidentally omitted the
sigaddsetcall and not blockedSIGCHLD,farmcould have deadlocked. Explain how.numWorkersAvailable == 0passes,farmis swapped off CPU, allkNumCPUsworkers self-halt, allkNumCPUsSIGCHLDs handled by oneSIGCHLDhandler call,farmdescends intosigsuspend, and no additionalSIGCHLDs ever arrive to wakefarmup.
- In past quarters, I saw a bunch of students who lifted the block on
SIGCHLDbefore the two lines in bold instead of after. As it turns out, executingnumWorkersAvailable--after the block is lifted can cause problems, but executingworkers[i].available = falseactually can’t. Explain why the placement of the--is more sensitive to race conditions than the Boolean assignment is.- Execution of
--could be interrupted and confused by execution of++withinSIGCHLDhandler. - When Boolean assignment is executed, relevant worker is halted, so
interruption by
SIGCHLDhandler would hop overworkers[i], as its pid can’t possibly have been returned by handler’swaitpidcall untilworkers[i]is continued.
- Execution of
static sigset_t waitForAvailableWorker() { sigset_t existing, additions; sigemptyset(&additions); sigaddset(&additions, SIGCHLD); sigprocmask(SIG_BLOCK, &additions, &existing); while (numWorkersAvailable == 0) sigsuspend(&existing); return existing; } static size_t getAvailableWorker() { sigset_t existing = waitForAvailableWorker(); size_t i; for (i = 0; !workers[i].available; i++); assert(i < workers.size()); numWorkersAvailable--; workers[i].available = false; sigprocmask(SIG_SETMASK, &existing, NULL); // restore original block set return i; }- Assuming no signals are blocked at the time
Problem 2: Threading Short Answer Questions
Here are some more short answer questions about the specifics of threads and the directives that help threads communicate.
Is
i--thread safe on a single core machine? Why or why not?i--is not thread safe if it expands to two or more assembly code instructions, as it does in x86_64. For a local variable not already in a register, the generated code might look like this:mov (%rbp), %rax sub $1, %rax mov %rax, (%rbp)If the thread gets swapped off the processor after the first or second of those two lines and another thread is scheduled and changes
i, the original thread will eventually resume and continue on with stale data.Some architectures
arecapable of doing an in-memory decrement (or, more broadly, a memory-memory instruction) in one instruction, soi--could compile to one assembly code instruction on some architectures if i is a global variable, and in a single core world where only one thing can happen at a time. But in general, you write code to be portable and architecture-agnostic, so you would need to operate as ifi--, even whereiis a global, is thread-unsafe.
What’s the difference between a
mutexand asemaphorewith an initial value of 1? Can one be substituted for the other?- The
semaphorecould be used in place of themutex, but not vice versa. First off, the thread that acquires thelockon amutexmust be the one to unlock it (by C++ specification), whereas one thread cansignalasemaphorewhile a second thread waits on it. There’s also nothing to prevent asemaphoreinitialized to 1 from being signaled to surround an even higher number (like 2, or 2 million), whereas amutexcan really only be in one of two states.
- The
What is the
lock_guardclass used for, and why is it useful?- It’s used to layer over some kind of mutex (typically a
mutex, but possibly some other mutex classes liketimed_mutexandrecursive_mutex) so that it’s locked when it needs to be locked (via thelock_guardconstructor) and unlocked when thelock_guardgoes out of scope (via its destructor). It’s useful because it guarantees that a mutex that’s locked through alock_guardwill be unlocked no matter how the function, method, or enclosing scope ends. In particular, a function might have two or more exits paths (some early returns, some exceptions being thrown, and so forth).
- It’s used to layer over some kind of mutex (typically a
What is busy waiting? Is it ever a good idea? Does your answer to the good-idea question depend on the number of CPUs your multithreaded application has access to?
- Busy waiting is consuming processor time waiting for some condition to
change, as with
while (numAvailableWorkers == 0) {;}. In a single CPU machine, it makes sense for a process or thread that would otherwise busy wait to yield the processor, since the condition can’t possibly change until some other thread gets the processor and makes changes to the variables that are part of the condition. That’s whatsigsuspend(for processes) andconditional_variable_any::wait(for threads) are for. In a few scenarios where multithreaded code is running on a machine with multiple CPUs, it’s okay to busy waitif and onlyif you know with high probability that another thread is running at the same time (on another CPU) and will invert the condition that’s causing the first thread to busy wait.
- Busy waiting is consuming processor time waiting for some condition to
change, as with
As it turns out, the semaphore’s constructor allows a negative number to be passed in, as with
semaphore s(-11). Identify a scenario where -11 might be a sensible initial value.- Say you have 24 threads, and you need to wait for just 12 of them to
finish (any 12 of the 24). You could create a semaphore initialized to
-11, signal that semaphore in each thread on completion, and then wait on
the semaphore in the main thread. The
wait()call will return once the semaphore value reaches 1, which happens when 12 of the threads signal it.
- Say you have 24 threads, and you need to wait for just 12 of them to
finish (any 12 of the 24). You could create a semaphore initialized to
-11, signal that semaphore in each thread on completion, and then wait on
the semaphore in the main thread. The
What would the implementation of
semaphore::signal(size_t increase = 1)need to look like if we wanted to allow asemaphore’s encapsulated value to be promoted by theincreaseamount? Note thatincreasedefaults to 1, so that this version could just replace the standardsemaphore::signalthat’s officially exported by thesemaphoreabstraction.- The key line in signal would need to replace the
++with a+=increase. It would also need to decide whether the condition variable needs to be notified by checking to see if the encapsulated value pre-increment was less than or equal to 0 and the value post-increment is positive. If so, thennotify_allneeds to be invoked.
- The key line in signal would need to replace the
What’s the multiprocessing equivalent of the
mutex?- It’s not a perfect parallel, but if I had to choose one, it’d be signal
masks. We used signal masks to remove the possibility of interruption by
signal, which is the only thing that can otherwise introduce a race
condition into a sequential program. (Another answer is something I
referenced in an earlier solution, even though I didn’t teach it in
CS110. Investigate the
flockfunction and think about how it could be used to prevent interprocess race conditions on shared memory mapped segments.)
- It’s not a perfect parallel, but if I had to choose one, it’d be signal
masks. We used signal masks to remove the possibility of interruption by
signal, which is the only thing that can otherwise introduce a race
condition into a sequential program. (Another answer is something I
referenced in an earlier solution, even though I didn’t teach it in
CS110. Investigate the
What’s the multiprocessing equivalent of the
condition_variable_any?sigsuspendis the equivalent ofconditional_variable_any::wait, and a signal outside thesigsuspendsignal mask is the equivalent ofconditional_variable_any::notify_[one|all].
The
semaphoreimplementation we produced in lecture is repeated below. Many students have asked whether the very last line ofsemaphore::signalcould have callednotify_oneinstead ofnotify_all. Does is matter? Explain.semaphore::semaphore(int value) : value(value) {} void semaphore::wait() { lock_guard<mutex> lg(m); cv.wait(m, [this]{ return value > 0; }); value--; } void semaphore::signal() { lock_guard<mutex> lg(m); value++; if (value == 1) cv.notify_all(); }- It definitely matters. Assume signal is implemented in terms of
notify_oneinstead ofnotify_all, and consider a four-thread test program with a single semaphores(initialized to 0), threads 1 and 2 callings.wait(), threads 3 and 4 callings.signal(). The following scheduling pattern – one that’s very possible – leads to thread 2 blocking forever:- Threads 1 and 2 each progress and block on
s.wait(). - Thread 3 signals
s, which notifiescvto wake thread 1 (but not 2) up. - Thread 4 signals
s, without notifyingcv. - Thread 1 passes through
s.wait(), but thread 2 is still asleep undercv, even though the semaphore value is 1.
- Threads 1 and 2 each progress and block on
- It definitely matters. Assume signal is implemented in terms of
Problem 3: Threads vs Processes
To provide some answers about the pros and cons of threads and processes, we’ll lead you through a collection of short answer questions about multiprocessing and multithreading that focuses more on the big picture.
- What does it mean when we say that a process has a private address space?
- It means that its range of addresses are, by default, private to the owning process, and that it’s impossible for an another arbitrary, unprivileged process to access any of it.
- What are the advantages of a private address space?
- The fact that it’s private, of course. Most other programs can’t accidentally or intentionally examine another process’s data.
- What are the disadvantages?
- The fact that it’s private, of course. Address space privacy makes it exceptionally difficult to exchange data with other processes, and works against any efforts to breezily distribute work over multiple CPUs using multiprocessing.
- What programming directives have we used in prior discussion section handouts
to circumvent address space privacy?
- Signals and pipes can be used to communicate between processes despite
address space privacy. We didn’t talk about it in class, but
mmapcan also be used to create segments of memory shared between multiple processes.
- Signals and pipes can be used to communicate between processes despite
address space privacy. We didn’t talk about it in class, but
- When architecting a larger program like
farmorstshthat relies on multiprocessing, what did we need to do to exchange information across process boundaries?farmrelied on pipes to forward data to each of the workers, and it relied onSIGTSTP,SIGCHLD,SIGCONT,SIGKILL, and aSIGCHLDhandler to manage synchronization issues. And in some sense,farmrelied on theexecvp’s string arguments to influence what each of the workers should be running.stshrelied on pretty much the same thing, albeit for different reasons.stshrelied on signals and signal handlers to manage job control, terminal control transfer to be clear who’s responding to keyboard events and any signals associated with them, and pipes to foster communication between neighboring processes in a pipeline.
- Can a process be used to execute multiple executables? Restated, can it
execvptwice to run multiple programs?- Nope, not in general. A program like
stshcan fork off additional processes, each of whichexecvps, but that’s not the same thing. Bottom line is that process spaces can’t be reused to run multiple executables. They can only be transformed once.
- Nope, not in general. A program like
- Threads are often called lightweight processes. In what sense are they processes? And why the lightweight distinction?
- Each thread of execution runs as if it’s a miniature program. Threads
have their own stack, and they have access to globals, dynamic memory
allocation, global data, system calls, and so forth. They’re relatively
lightweight compared to true processes, because you don’t need to create
a new process when creating a thread. The thread manager cordons off a
portion of the full stack segment for the new thread’s stack, but
otherwise the thread piggybacks off existing text, heap, and data
segments previously establish by
forkandexecvp. (For more info, go ahead and read the man page forclone).
- Each thread of execution runs as if it’s a miniature program. Threads
have their own stack, and they have access to globals, dynamic memory
allocation, global data, system calls, and so forth. They’re relatively
lightweight compared to true processes, because you don’t need to create
a new process when creating a thread. The thread manager cordons off a
portion of the full stack segment for the new thread’s stack, but
otherwise the thread piggybacks off existing text, heap, and data
segments previously establish by
- Threads are often called virtual processes as well. In what sense are threads an example of virtualization?
- Virtualization is an abstraction mechanism used to make a single resource appear to be many or to make many resources appear to be one. In this case, the process is subdivided to house many lightweight processes, so that one process is made to look like many.
- Threads running within the same process all share the same address space. What are the advantages and disadvantages of allowing threads to access pretty much all of virtual memory?
- It’s a double-edged sword, as you’re all learning with Assignments 5 and 6. Because all threads have access to the same virtual address space, it’s relatively trivial to share information. However, because two or more threads might want to perform surgery on a shared piece of data, directives must be implanted into thread routines to ensure data and resources are shared without inspiring data corruption via race conditions, or deadlock via resource contention. That’s a very difficult thing to consistently get right.
Each thread within a larger process is given a thread id, also called a tid. In fact, the thread id concept is just an extension of the process id. For singly threaded processes, the pid and the main thread’s tid are precisely the same. If a process with pid 12345 creates three additional threads beyond the main thread (and no other processes or threads within other processes are created), then the tid of the main thread would be 12345, and the thread ids of the three other threads would be 12346, 12347, and 12348.
- What are the advantages of leveraging the pid abstraction for thread ids?
- To the extent that the OS can view threads as processes, the OS can provide services on a per-thread basis instead of just a per-process one.
- What happens if you pass a thread id that isn’t a process id to
waitpid?- Poorly documented, but the takeaway here is that
waitpiddoes not consider it to be an error if a thread id is passed towaitpid.waitpid’s man page (https://linux.die.net/man/2/waitpid) includes some information about howwaitpidand threads interact with one another.
- Poorly documented, but the takeaway here is that
- What happens if you pass a thread id to
sched_setaffinity?- It actually works, and informs the OS that the identified thread should
only be run on those CPUs included in the provided
cpuset_t.
- It actually works, and informs the OS that the identified thread should
only be run on those CPUs included in the provided
- What are the advantages of requiring that a thread always be assigned to the same CPU?
- Each CPU typically has a dedicated L1 cache that stores data fetched by instructions run on that CPU. CPU-specific L1 caches are more likely to retain content relevant to the thread if the same thread keeps coming back instead of being arbitrarily assigned to other CPUs.
In some situations, the decision to rely on multithreading instead of multiprocessing is dictated solely by whether the code to be run apart from the main thread is available in executable or in library form. But in other situations, we have a choice.
- Why might you prefer multithreading over multiprocessing if both are reasonably good options?
- Ease of data exchange and code sharing.
- Why might you prefer multiprocessing over multithreading if both are reasonably good options?
- Protection, privacy, insulation from other processes errors.
- What happens if a thread within a larger process calls
fork?- On Linux, the process space is cloned, but the only surviving thread is the one that called fork. In general, you don’t want a single thread of many to call fork unless you have an exquisite reason to do so.
- What happens if a thread within a larger process calls
execvp?- It transforms the entire process to run a new executable. The fact that the pre-execvp process involved threading is irrelevant once execvp is invoked. It’s that drastic.
And some final questions about how farm and stsh could be implemented:
- Assume you know nothing about multiprocessing but needed to emulate the functionality of
farm. How could you use multithreading to design an equally parallel solution without ever relying on multiprocessing?- You’ll see in Assignment 4 that a ThreadPool is a generalization of what we need here.
- Spawn one thread for each CPU, and set the thread routine to the code
that computes and prints a factorization (using
oslockandosunlock, of course). Note that all workers are initially available by storing something on behalf of each thread in the parent (array indices, thread ids, whatever you want). - Implement the
mainthread to continue and read all numbers to be factored, forwarding numbers on to available workers (I’m being intentionally vague here as as to not steal the thunder of theThreadPool). - As workers are assigned numbers to be factored, some note they’re occupied, and as each worker finishes printing its factorization, have it mark itself as available.
- Use
semaphores to coordinate rendezvous communication between orchestrator and each workers. Rely onmutexes to guard shared data structures (e.g. the set of available workers) from race conditions. - When all numbers have been factored and all threads are available,
prompt each thread to gracefully exit,
joinon all of those threads, and exit the program.
- Could multithreading contribute to the implementation of
stshin any meaningful way? Why or why not?- Not really. A shell is all about process control –
forking andexecvp-ing, and there are no in-shell algorithms that are intensely CPU, I/O, or network-bound, so introducing parallelism withinstshitself is arguably unnecessary.
- Not really. A shell is all about process control –