Lab 4 Solutions
This handout was adapted from Jerry Cain’s Spring 2018 offering.
I’ve created a Slack channel for Lab 4 discussion (aptly named #lab4), and all students (but particularly remote students) are encouraged to share their ideas there.
Problem 1: Assignment 2 Revisited
Here are a collection of short answer questions drilling your understanding of
subprocess
and farm
. It’s not uncommon for students to get working
solutions to assignments and still not be entirely clear why they work. These
questions are here to force you to think big picture and understand the systems
concepts I feel are important.
- Your Assignment 2 implementation of
subprocess
required two pipes – one to foster a parent-to-child communication channel, and a second to foster a child-to-parent communication channel. Clearly explain why a single pipe shouldn’t be used to support both communication channels.- Because both child and parent would need to write to
fds[1]
, and there’d be no generic way to codify whether material in the pipe is intended for child or parent, so one could accidentally read what’s intended for the other.
- Because both child and parent would need to write to
- You’ve seen
dprintf
in lecture and in theassign3
handout, and it presumably contributed to manyfarm
implementations. Explain why there’s adprintf
function, but there’s no analogousdscanf
function. Hint: Think about whydprintf(fd, "%s %d\n", str, i)
would be easy to manage whereasdscanf(fd, "%s %d\n", str, &i)
wouldn’t be. Read the first few lines of theman
pages for the traditionalfprintf
andfscanf
functions to understand how they operate.- With
dprintf
, the entire string can be constructed (and its length computed) before the underlyingwrite
calls.dscanf
might need to read an extra character (e.g. the space in"1234 "
) to confirm a placeholder like%d
has been matched, and there’s no way to revert that extra read.
- With
Consider the implementation of
spawnAllWorkers
below. Even though it rarely causes problems, the highlighted line below technically contains a race condition. Briefly describe the race condition, and explain how to fix it.struct worker { worker() {} worker(char *argv[], const set<int>& openfds = set<int>()) : sp(subprocess(argv, true, false, openfds)), available(false) {} subprocess_t sp; bool available; }; static const size_t kNumCPUs = sysconf(_SC_NPROCESSORS_ONLN); static vector<worker> workers(kNumCPUs); static size_t numWorkersAvailable; static const char *kWorkerArguments[] = { "./factor.py", "--self-halting", NULL }; static void spawnAllWorkers() { cout << "There are this many CPUs: " << kNumCPUs << ", numbered 0 through " << kNumCPUs - 1 << "." << endl; set<int> openfds; for (size_t i = 0; i < workers.size(); i++) { workers[i] = worker(const_cast<char **>(kWorkerArguments, openfds)); openfds.insert(workers[i].supplyfd); assignToCPU(workers[i].sp.pid , i); // implementation omitted, irrelevant } } int main(int argc, char *argv[]) { signal(SIGCHLD, markWorkersAsAvailable); // markWorkersAsAvailable is correct spawnAllWorkers(); // other functions, assume all correct return 0; }
- The right hand side constructs a temporary that launches a process that self-halts before content of temporary is copied into
workers[i]
. If that happens, theSIGCHLD
handler is invoked and crawls over an array that doesn’t have the pid yet, and sadness ensues. - The solution is to block
SIGCHLD
before the highlighted line and then unblock after.
- The right hand side constructs a temporary that launches a process that self-halts before content of temporary is copied into
While implementing the
farm
program forassign2
, you were expected to implement agetAvailableWorker
function to effectively blockfarm
until at least one worker was available. My own solution relied on a helper function I calledwaitForAvailableWorker
, which I present below. After analyzing my own solution, answer the following questions:- Assuming no signals are blocked at the time
waitForAvailableWorker
is called, clearly identify whenSIGCHLD
is blocked and when it is not.- TL;DR: SIGCHLD isn’t blocked prior to the
sigprocmask
call, within the call tosigsuspend
. It’s blocked everywhere else. - More detail: The first three lines create a singleton mask containing
just
SIGCHLD
, and the fourth line blocks it. Thewhile
loop test is evaluated while theSIGCHLD
block is in place, which is exactly what we want, because we don’t want its evaluation to be interrupted by the execution ofmarkWorkersAsAvailable
. If the test passes, the callsigsuspend
simultaneously lifts the block onSIGCHLD
(remember:existingmask
is empty) and puts the process to sleep until some (or rather, any) signal is received, at which point any installed handler (presumably theSIGCHLD
handler, since we expectSIGCHLD
to be the signal that comes in) executes with high priority. After any handler exits,sigsuspend
returns while simultaneously restoring the mask that was in place beforesigsuspend
was called, and re-evaluates the test again with a block onSIGCHLD
. That process repeats until thewhile
loop test fails, at which pointwaitForAvailableWorker
returns, leaving the block onSIGCHLD
in place.
- TL;DR: SIGCHLD isn’t blocked prior to the
- Had I accidentally passed in
&additions
to thesigsuspend
call instead of&existing
, the farm could have deadlocked. Explain why.numWorkersAvailable == 0
could pass, and thensigsuspend
would forcefarm
to deadlock, as onlySIGCHLD
signals are coming in and capable of changingnumWorkersAvailable
, and they’re blocked.
- Had I accidentally omitted the
sigaddset
call and not blockedSIGCHLD
,farm
could have deadlocked. Explain how.numWorkersAvailable == 0
passes,farm
is swapped off CPU, allkNumCPUs
workers self-halt, allkNumCPUs
SIGCHLD
s handled by oneSIGCHLD
handler call,farm
descends intosigsuspend
, and no additionalSIGCHLD
s ever arrive to wakefarm
up.
- In past quarters, I saw a bunch of students who lifted the block on
SIGCHLD
before the two lines in bold instead of after. As it turns out, executingnumWorkersAvailable--
after the block is lifted can cause problems, but executingworkers[i].available = false
actually can’t. Explain why the placement of the--
is more sensitive to race conditions than the Boolean assignment is.- Execution of
--
could be interrupted and confused by execution of++
withinSIGCHLD
handler. - When Boolean assignment is executed, relevant worker is halted, so
interruption by
SIGCHLD
handler would hop overworkers[i]
, as its pid can’t possibly have been returned by handler’swaitpid
call untilworkers[i]
is continued.
- Execution of
static sigset_t waitForAvailableWorker() { sigset_t existing, additions; sigemptyset(&additions); sigaddset(&additions, SIGCHLD); sigprocmask(SIG_BLOCK, &additions, &existing); while (numWorkersAvailable == 0) sigsuspend(&existing); return existing; } static size_t getAvailableWorker() { sigset_t existing = waitForAvailableWorker(); size_t i; for (i = 0; !workers[i].available; i++); assert(i < workers.size()); numWorkersAvailable--; workers[i].available = false; sigprocmask(SIG_SETMASK, &existing, NULL); // restore original block set return i; }
- Assuming no signals are blocked at the time
Problem 2: Threading Short Answer Questions
Here are some more short answer questions about the specifics of threads and the directives that help threads communicate.
Is
i--
thread safe on a single core machine? Why or why not?i--
is not thread safe if it expands to two or more assembly code instructions, as it does in x86_64. For a local variable not already in a register, the generated code might look like this:mov (%rbp), %rax sub $1, %rax mov %rax, (%rbp)
If the thread gets swapped off the processor after the first or second of those two lines and another thread is scheduled and changes
i
, the original thread will eventually resume and continue on with stale data.Some architectures
are
capable of doing an in-memory decrement (or, more broadly, a memory-memory instruction) in one instruction, soi--
could compile to one assembly code instruction on some architectures if i is a global variable, and in a single core world where only one thing can happen at a time. But in general, you write code to be portable and architecture-agnostic, so you would need to operate as ifi--
, even wherei
is a global, is thread-unsafe.
What’s the difference between a
mutex
and asemaphore
with an initial value of 1? Can one be substituted for the other?- The
semaphore
could be used in place of themutex
, but not vice versa. First off, the thread that acquires thelock
on amutex
must be the one to unlock it (by C++ specification), whereas one thread cansignal
asemaphore
while a second thread waits on it. There’s also nothing to prevent asemaphore
initialized to 1 from being signaled to surround an even higher number (like 2, or 2 million), whereas amutex
can really only be in one of two states.
- The
What is the
lock_guard
class used for, and why is it useful?- It’s used to layer over some kind of mutex (typically a
mutex
, but possibly some other mutex classes liketimed_mutex
andrecursive_mutex
) so that it’s locked when it needs to be locked (via thelock_guard
constructor) and unlocked when thelock_guard
goes out of scope (via its destructor). It’s useful because it guarantees that a mutex that’s locked through alock_guard
will be unlocked no matter how the function, method, or enclosing scope ends. In particular, a function might have two or more exits paths (some early returns, some exceptions being thrown, and so forth).
- It’s used to layer over some kind of mutex (typically a
What is busy waiting? Is it ever a good idea? Does your answer to the good-idea question depend on the number of CPUs your multithreaded application has access to?
- Busy waiting is consuming processor time waiting for some condition to
change, as with
while (numAvailableWorkers == 0) {;}
. In a single CPU machine, it makes sense for a process or thread that would otherwise busy wait to yield the processor, since the condition can’t possibly change until some other thread gets the processor and makes changes to the variables that are part of the condition. That’s whatsigsuspend
(for processes) andconditional_variable_any::wait
(for threads) are for. In a few scenarios where multithreaded code is running on a machine with multiple CPUs, it’s okay to busy waitif and only
if you know with high probability that another thread is running at the same time (on another CPU) and will invert the condition that’s causing the first thread to busy wait.
- Busy waiting is consuming processor time waiting for some condition to
change, as with
As it turns out, the semaphore’s constructor allows a negative number to be passed in, as with
semaphore s(-11)
. Identify a scenario where -11 might be a sensible initial value.- Say you have 24 threads, and you need to wait for just 12 of them to
finish (any 12 of the 24). You could create a semaphore initialized to
-11, signal that semaphore in each thread on completion, and then wait on
the semaphore in the main thread. The
wait()
call will return once the semaphore value reaches 1, which happens when 12 of the threads signal it.
- Say you have 24 threads, and you need to wait for just 12 of them to
finish (any 12 of the 24). You could create a semaphore initialized to
-11, signal that semaphore in each thread on completion, and then wait on
the semaphore in the main thread. The
What would the implementation of
semaphore::signal(size_t increase = 1)
need to look like if we wanted to allow asemaphore
’s encapsulated value to be promoted by theincrease
amount? Note thatincrease
defaults to 1, so that this version could just replace the standardsemaphore::signal
that’s officially exported by thesemaphore
abstraction.- The key line in signal would need to replace the
++
with a+=
increase. It would also need to decide whether the condition variable needs to be notified by checking to see if the encapsulated value pre-increment was less than or equal to 0 and the value post-increment is positive. If so, thennotify_all
needs to be invoked.
- The key line in signal would need to replace the
What’s the multiprocessing equivalent of the
mutex
?- It’s not a perfect parallel, but if I had to choose one, it’d be signal
masks. We used signal masks to remove the possibility of interruption by
signal, which is the only thing that can otherwise introduce a race
condition into a sequential program. (Another answer is something I
referenced in an earlier solution, even though I didn’t teach it in
CS110. Investigate the
flock
function and think about how it could be used to prevent interprocess race conditions on shared memory mapped segments.)
- It’s not a perfect parallel, but if I had to choose one, it’d be signal
masks. We used signal masks to remove the possibility of interruption by
signal, which is the only thing that can otherwise introduce a race
condition into a sequential program. (Another answer is something I
referenced in an earlier solution, even though I didn’t teach it in
CS110. Investigate the
What’s the multiprocessing equivalent of the
condition_variable_any
?sigsuspend
is the equivalent ofconditional_variable_any::wait
, and a signal outside thesigsuspend
signal mask is the equivalent ofconditional_variable_any::notify_[one|all]
.
The
semaphore
implementation we produced in lecture is repeated below. Many students have asked whether the very last line ofsemaphore::signal
could have callednotify_one
instead ofnotify_all
. Does is matter? Explain.semaphore::semaphore(int value) : value(value) {} void semaphore::wait() { lock_guard<mutex> lg(m); cv.wait(m, [this]{ return value > 0; }); value--; } void semaphore::signal() { lock_guard<mutex> lg(m); value++; if (value == 1) cv.notify_all(); }
- It definitely matters. Assume signal is implemented in terms of
notify_one
instead ofnotify_all
, and consider a four-thread test program with a single semaphores
(initialized to 0), threads 1 and 2 callings.wait()
, threads 3 and 4 callings.signal()
. The following scheduling pattern – one that’s very possible – leads to thread 2 blocking forever:- Threads 1 and 2 each progress and block on
s.wait()
. - Thread 3 signals
s
, which notifiescv
to wake thread 1 (but not 2) up. - Thread 4 signals
s
, without notifyingcv
. - Thread 1 passes through
s.wait()
, but thread 2 is still asleep undercv
, even though the semaphore value is 1.
- Threads 1 and 2 each progress and block on
- It definitely matters. Assume signal is implemented in terms of
Problem 3: Threads vs Processes
To provide some answers about the pros and cons of threads and processes, we’ll lead you through a collection of short answer questions about multiprocessing and multithreading that focuses more on the big picture.
- What does it mean when we say that a process has a private address space?
- It means that its range of addresses are, by default, private to the owning process, and that it’s impossible for an another arbitrary, unprivileged process to access any of it.
- What are the advantages of a private address space?
- The fact that it’s private, of course. Most other programs can’t accidentally or intentionally examine another process’s data.
- What are the disadvantages?
- The fact that it’s private, of course. Address space privacy makes it exceptionally difficult to exchange data with other processes, and works against any efforts to breezily distribute work over multiple CPUs using multiprocessing.
- What programming directives have we used in prior discussion section handouts
to circumvent address space privacy?
- Signals and pipes can be used to communicate between processes despite
address space privacy. We didn’t talk about it in class, but
mmap
can also be used to create segments of memory shared between multiple processes.
- Signals and pipes can be used to communicate between processes despite
address space privacy. We didn’t talk about it in class, but
- When architecting a larger program like
farm
orstsh
that relies on multiprocessing, what did we need to do to exchange information across process boundaries?farm
relied on pipes to forward data to each of the workers, and it relied onSIGTSTP
,SIGCHLD
,SIGCONT
,SIGKILL
, and aSIGCHLD
handler to manage synchronization issues. And in some sense,farm
relied on theexecvp
’s string arguments to influence what each of the workers should be running.stsh
relied on pretty much the same thing, albeit for different reasons.stsh
relied on signals and signal handlers to manage job control, terminal control transfer to be clear who’s responding to keyboard events and any signals associated with them, and pipes to foster communication between neighboring processes in a pipeline.
- Can a process be used to execute multiple executables? Restated, can it
execvp
twice to run multiple programs?- Nope, not in general. A program like
stsh
can fork off additional processes, each of whichexecvp
s, but that’s not the same thing. Bottom line is that process spaces can’t be reused to run multiple executables. They can only be transformed once.
- Nope, not in general. A program like
- Threads are often called lightweight processes. In what sense are they processes? And why the lightweight distinction?
- Each thread of execution runs as if it’s a miniature program. Threads
have their own stack, and they have access to globals, dynamic memory
allocation, global data, system calls, and so forth. They’re relatively
lightweight compared to true processes, because you don’t need to create
a new process when creating a thread. The thread manager cordons off a
portion of the full stack segment for the new thread’s stack, but
otherwise the thread piggybacks off existing text, heap, and data
segments previously establish by
fork
andexecvp
. (For more info, go ahead and read the man page forclone
).
- Each thread of execution runs as if it’s a miniature program. Threads
have their own stack, and they have access to globals, dynamic memory
allocation, global data, system calls, and so forth. They’re relatively
lightweight compared to true processes, because you don’t need to create
a new process when creating a thread. The thread manager cordons off a
portion of the full stack segment for the new thread’s stack, but
otherwise the thread piggybacks off existing text, heap, and data
segments previously establish by
- Threads are often called virtual processes as well. In what sense are threads an example of virtualization?
- Virtualization is an abstraction mechanism used to make a single resource appear to be many or to make many resources appear to be one. In this case, the process is subdivided to house many lightweight processes, so that one process is made to look like many.
- Threads running within the same process all share the same address space. What are the advantages and disadvantages of allowing threads to access pretty much all of virtual memory?
- It’s a double-edged sword, as you’re all learning with Assignments 5 and 6. Because all threads have access to the same virtual address space, it’s relatively trivial to share information. However, because two or more threads might want to perform surgery on a shared piece of data, directives must be implanted into thread routines to ensure data and resources are shared without inspiring data corruption via race conditions, or deadlock via resource contention. That’s a very difficult thing to consistently get right.
Each thread within a larger process is given a thread id, also called a tid
. In fact, the thread id concept is just an extension of the process id. For singly threaded processes, the pid and the main thread’s tid are precisely the same. If a process with pid 12345 creates three additional threads beyond the main thread (and no other processes or threads within other processes are created), then the tid of the main thread would be 12345, and the thread ids of the three other threads would be 12346, 12347, and 12348.
- What are the advantages of leveraging the pid abstraction for thread ids?
- To the extent that the OS can view threads as processes, the OS can provide services on a per-thread basis instead of just a per-process one.
- What happens if you pass a thread id that isn’t a process id to
waitpid
?- Poorly documented, but the takeaway here is that
waitpid
does not consider it to be an error if a thread id is passed towaitpid
.waitpid
’s man page (https://linux.die.net/man/2/waitpid) includes some information about howwaitpid
and threads interact with one another.
- Poorly documented, but the takeaway here is that
- What happens if you pass a thread id to
sched_setaffinity
?- It actually works, and informs the OS that the identified thread should
only be run on those CPUs included in the provided
cpuset_t
.
- It actually works, and informs the OS that the identified thread should
only be run on those CPUs included in the provided
- What are the advantages of requiring that a thread always be assigned to the same CPU?
- Each CPU typically has a dedicated L1 cache that stores data fetched by instructions run on that CPU. CPU-specific L1 caches are more likely to retain content relevant to the thread if the same thread keeps coming back instead of being arbitrarily assigned to other CPUs.
In some situations, the decision to rely on multithreading instead of multiprocessing is dictated solely by whether the code to be run apart from the main thread is available in executable or in library form. But in other situations, we have a choice.
- Why might you prefer multithreading over multiprocessing if both are reasonably good options?
- Ease of data exchange and code sharing.
- Why might you prefer multiprocessing over multithreading if both are reasonably good options?
- Protection, privacy, insulation from other processes errors.
- What happens if a thread within a larger process calls
fork
?- On Linux, the process space is cloned, but the only surviving thread is the one that called fork. In general, you don’t want a single thread of many to call fork unless you have an exquisite reason to do so.
- What happens if a thread within a larger process calls
execvp
?- It transforms the entire process to run a new executable. The fact that the pre-execvp process involved threading is irrelevant once execvp is invoked. It’s that drastic.
And some final questions about how farm
and stsh
could be implemented:
- Assume you know nothing about multiprocessing but needed to emulate the functionality of
farm
. How could you use multithreading to design an equally parallel solution without ever relying on multiprocessing?- You’ll see in Assignment 4 that a ThreadPool is a generalization of what we need here.
- Spawn one thread for each CPU, and set the thread routine to the code
that computes and prints a factorization (using
oslock
andosunlock
, of course). Note that all workers are initially available by storing something on behalf of each thread in the parent (array indices, thread ids, whatever you want). - Implement the
main
thread to continue and read all numbers to be factored, forwarding numbers on to available workers (I’m being intentionally vague here as as to not steal the thunder of theThreadPool
). - As workers are assigned numbers to be factored, some note they’re occupied, and as each worker finishes printing its factorization, have it mark itself as available.
- Use
semaphore
s to coordinate rendezvous communication between orchestrator and each workers. Rely onmutex
es to guard shared data structures (e.g. the set of available workers) from race conditions. - When all numbers have been factored and all threads are available,
prompt each thread to gracefully exit,
join
on all of those threads, and exit the program.
- Could multithreading contribute to the implementation of
stsh
in any meaningful way? Why or why not?- Not really. A shell is all about process control –
fork
ing andexecvp
-ing, and there are no in-shell algorithms that are intensely CPU, I/O, or network-bound, so introducing parallelism withinstsh
itself is arguably unnecessary.
- Not really. A shell is all about process control –