Assignment 2: All Things Multiprocessing
This handout was adapted from Jerry Cain’s Spring 2018 offering.
We’ve progressed through a good amount of material involving processes and process communication. Rather than building one large program, I’d like you to code up a few different things with the idea that you’ll learn more by tackling multiple problems and leveraging your understanding of the material in multiple domains. All of these programs should be coded directly within a single repository, which you can get by typing the following:
git clone /usr/class/cs110/repos/assign2/$USER assign2
There are three problems in total; the first two exercise your understanding of pipes, and the last one also requires an understanding of signals.
Due Date: Friday, July 13, 2018 at 11:59pm
A note on assignment timing
This is supposed to be a one-week assignment starting Friday, 7/6. I am
releasing it a little bit early, because I know a handful of SCPD students have
some time off before Friday and would like to work on the assignment before
they’re busy with work again. However, you should not feel pressured to start
early. In fact, I think it might be better to not start early, because
there is a bit of lecture material (surrounding the dup2
system call) that I
did not get to cover on Monday, and our discussion will probably help you a lot
on Problem 1. If you do decide to start early, you should read through the
last example from Monday’s lecture
notes.
I apologize for the mess, but since I mentioned in lecture that I
would release it early (before I realized I was going to run out of time), I’m
still releasing it early.
Problem 1: Implementing a pipeline
in C
Your first task is to implement the pipeline
function. This
pipeline
function accepts two argument vectors, and assuming both vectors
are legit, spawns off twin processes with the added bonus that the standard
output of the first is directed to the standard input of the second. Here’s
the interface you’re coding to:
void pipeline(char *argv1[], char *argv2[], pid_t pids[]);
For simplicity, you can assume that all calls to pipeline are well-formed and
will work as expected. In other words, argv1
and argv2
are each valid,
NULL
-terminated argument vectors, and that pids
is the base address of an
array of length two. Further assume that all calls to pipe
, dup2
, close
,
execvp
, and so forth succeed so that you needn’t do any error checking
whatsoever. pipeline
should return without waiting for either of the child
processes to finish (i.e. your pipeline
implementation should not call
waitpid
anywhere), and the ids of the daisy-chained processes are dropped
into pids[0]
and pids[1]
. Also, ensure that the two processes are running
in parallel as much as possible, so that pipeline({"sleep", "10", NULL},
{"sleep", "10", NULL}, pids)
takes about 10 seconds instead of 20.
You should place your implementation of pipeline
in pipeline.c
. There is no
sanitycheck for pipeline, but you can rely on pipeline-test.c
and the
pipeline-test
executable it compiles to exercise your implementation. The
pipeline-test.c
test harness I start you off with is small, so you should add
more tests of your own to prove that your pipeline
is coded to specification.
(We will not grade your pipeline-test.c
– this is purely for your own
testing purposes.)
Note that this first problem is standalone and doesn’t contribute to anything else that follows (although the concept of a pipeline will come back in Assignment 3).
Problem 2: Implementing subprocess
in C++
Your next task is to implement an even more flexible subprocess
than that
implemented in lecture. The most important part of the subprocess.h
interface file is right here:
static const int kNotInUse = -1;
struct subprocess_t {
pid_t pid;
int supplyfd;
int ingestfd;
};
/**
* Function: subprocess
* --------------------
* Creates a new process running the executable identified via argv[0].
*
* argv: the NULL-terminated argument vector that should be passed
* to the new process's main function
* supplyChildInput: true if the parent process would like to pipe content
* to the new process's stdin, false otherwise
* ingestChildOutput: true if the parent would like the child's stdout to be
* pushed to the parent, false otheriwse
* openfds: a set<int> of additional file descriptors known to be open in the
* parent that should be closed in the child process.
* This set should never include file descriptors 0, 1, or 2.
*/
subprocess_t subprocess(char *argv[],
bool supplyChildInput, bool ingestChildOutput,
const std::set<int>& openfds = std::set<int>()) throw (SubprocessException);
Read through the subprocess.h
documentation to see how this new subprocess
should work, and place your implementation in subprocess.cc
. Should any of
the system calls needed to implement your subprocess
routine fail (either in
the parent or in the child), you should throw a SubprocessException
around an
actionable error message. Inspect the subprocess-exceptions.h
file for the
full, inlined definition.
Use the test harness supplied by subprocess-test.cc
to exercise your
implementation, and by all means add to the subprocess-test.cc
file to ensure
that your implementation is bulletproof. When looking at subprocess-test.cc
,
you’ll also get a little bit of a reminder how try
/catch
blocks work.
You’ll want to add your own tests to subprocess-test.cc
to ensure that all
the (true
, true
), (true
, false
), (false
, true
), and (false
,
false
) combinations for (supplyChildInput
, ingestChildOutput
) all work as
expected.
Note that your implementation here is formally C++, since the larger exercise
that follows this one is also to be written in C++, and it needs to link
against your subprocess
implementation without drama. We’re switching to C++
pretty much from this problem forward, because C++ provides better support for
strings and generics than C does. C++ also provided native support for some
threading and concurrency directives we’ll be relying on a few weeks, and I’d
rather ease you into the language now than do so when we branch into the
multithreading topic. Truth be told, your C++ implementation of subprocess
will look as it would have in pure C, save for the fact that you’re throwing
C++ exceptions to identify errors.
Your fully functional subprocess
routine is used by the starter code I’ve
given you for the final exercise (the one requiring you implement the prime
factorization farm
). Do not start farm until you are confident in your
subprocess implementation.
Problem 3: Implementing farm
in C++
Your final challenge is to harness the power of a computer’s multiple cores to
manage a collection of executables, each running in parallel to contribute its
share to a larger result. For the purposes of this problem, we’re going to
contrive a scenario where the computation of interest – the prime factorization
of arbitrarily large numbers – is complex enough that some factorizations take
multiple seconds or even minutes to compute. The factorization algorithm
itself isn’t the focus here, save for the fact that it’s potentially time
consuming, and that should we need to compute multiple prime factorizations, we
should leverage the computing resources of our trusty myth
cluster to
multiprocess and generate output more quickly.
Consider the following Python program called factor.py
:
self_halting = len(sys.argv) > 1 and sys.argv[1] == '--self-halting'
pid = os.getpid()
while True:
if self_halting: os.kill(pid, signal.SIGSTOP)
try: num = int(raw_input()) # raw_input blocks, eventually returns a single line from stdin
except EOFError: break; # raw_input throws an EOFError when EOF is detected
start = time.time()
response = factorization(num)
stop = time.time()
print ' %s [pid: %d, time: %g seconds]' % (response, pid, stop - start)
You really don’t need to know Python to understand how it works, because every line of this particular program has a clear C or C++ analog. The primary things I’ll point out are:
- Python’s
print
operates just like C’sprintf
(and it’s even process-safe) raw_input
reads and returns a single line of text from standard input, blocking indefinitely until a line is supplied (chomping the'\n'
) or until end-of-file is detectedfactorization
is something I wrote; it takes an integer (e.g.12345678
) and returns the prime factorization (e.g.12345678 = 2 * 3 * 3 * 47 * 14593
) as a string. You’ll see it when you open upfactor.py
in your favorite text editor.- The
os.kill
line prompts the script to stop itself (but only if the script is invoked with the'--self-halting'
flag) and wait for it to be restarted viaSIGCONT
The following should convince you our script does what you’d expect:
$ printf "1234567\n12345678\n" | ./factor.py
1234567 = 127 * 9721 [pid: 28598, time: 0.0561171 seconds]
12345678 = 2 * 3 * 3 * 47 * 14593 [pid: 28598, time: 0.512921 seconds]
$ time printf "1234567\n12345678\n123456789\n1234567890\n" | ./factor.py
1234567 = 127 * 9721 [pid: 28601, time: 0.0521989 seconds]
12345678 = 2 * 3 * 3 * 47 * 14593 [pid: 28601, time: 0.517912 seconds]
123456789 = 3 * 3 * 3607 * 3803 [pid: 28601, time: 5.18094 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 28601, time: 51.763 seconds]
real 0m57.535s
user 0m57.516s
sys 0m0.004s
$ printf "1001\n10001\n" | ./factor.py --self-halting
$ kill -CONT %1
1001 = 7 * 11 * 13 [pid: 28625, time: 0.000285149 seconds]
$ kill -CONT %1
10001 = 73 * 137 [pid: 28625, time: 0.00222802 seconds]
$ kill -CONT %1
$ kill -CONT %1
-bash: kill: (28624) - No such process
$ time printf "123456789\n123456789\n" | ./factor.py
123456789 = 3 * 3 * 3607 * 3803 [pid: 28631, time: 5.1199 seconds]
123456789 = 3 * 3 * 3607 * 3803 [pid: 28631, time: 5.1183 seconds]
real 0m10.260s
user 0m10.248s
sys 0m0.008s
This last test may look silly, but it certainly verifies that one process is performing the same factorization twice, in sequence, so that the overall running time is roughly twice the time it takes to compute the factorization the first time (no caching here, so the second factorization does it all over again).
My factorization
function runs in O(n) time, so it’s very slow for some large
inputs. Should you need to compute the prime factorizations of many large
numbers, the factor.py
script would get the job done, but it may take a
while. If, however, you’re ssh
‘ed into a machine that has multiple
processors and/or multiple cores (each of the myth
s has eight cores), you can
write a program that manages several processes running factor.py
and tracks
which processes are idle and which processes are deep in thoughtful number
theory.
You’re going to write a program – a C++ program called farm
– that can run
on the myth
s to leverage the fact that you have eight cores at your
fingertips. farm
will spawn several workers – one for each core, each
running a self-halting instance of factor.py
, read an unbounded number of
positive integers (one per line, no error checking required), forward each
integer on to an idle worker (blocking until one or more workers is ready to
read the number), and allow all of the workers to cooperatively publish all
prime factorizations to standard output (without worrying about the order in
which they’re printed). To illustrate how farm
should work, check out the
following test case:
$ time printf "1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n" | ./farm
There are this many CPUs: 8, numbered 0 through 7.
Worker 4245 is set to run on CPU 0.
Worker 4246 is set to run on CPU 1.
Worker 4247 is set to run on CPU 2.
Worker 4248 is set to run on CPU 3.
Worker 4249 is set to run on CPU 4.
Worker 4250 is set to run on CPU 5.
Worker 4251 is set to run on CPU 6.
Worker 4252 is set to run on CPU 7.
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4249, time: 95.5286 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4252, time: 95.5527 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4245, time: 95.5824 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4247, time: 95.5851 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4248, time: 95.6578 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4250, time: 95.6627 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4251, time: 95.6666 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4246, time: 96.254 seconds]
real 1m36.285s
user 12m42.668s
sys 0m0.128s
Note that each of eight processes took about the same amount of time to compute the identical prime factorization, but because each was assigned to a different core, the real (aka perceived) time is basically the time it took to compute the factorization just once. How’s that for parallelism!
Note that prime factorizations aren’t required to be published in order – that makes this all a little easier – and repeat requests for the same prime factorization are all computed from scratch.
Your farm.cc
implementation will make use of the following C++ record, global
constants, and global variables:
struct worker {
worker() {}
worker(char *argv[], const set<int>& openfds) : sp(subprocess(argv, true, false, openfds)), available(false) {}
subprocess_t sp;
bool available;
};
static const size_t kNumCPUs = sysconf(_SC_NPROCESSORS_ONLN);
static vector<worker> workers(kNumCPUs);
static size_t numWorkersAvailable;
The main
function we give you includes stubs for all of the helper functions
that decompose it, and that main
function looks like this:
int main(int argc, char *argv[]) {
signal(SIGCHLD, markWorkersAsAvailable);
spawnAllWorkers();
broadcastNumbersToWorkers();
waitForAllWorkers();
closeAllWorkers();
return 0;
}
This final problem can be tricky, but it’s perfectly manageable provided you follow this road map:
- First implement
spawnAllWorkers
, which spawns a self-haltingfactor.py
process for each core and updates the globalworkers
vector so that each worker contains the relevantsubprocess_t
allowingfarm
.cc
to monitor it and pipe prime numbers to it. You can assign a process to always execute on a particular core by leveraging functionality outlined in theCPU_SET
andsched_setaffinity
man pages (i.e. type inman
CPU_SET
to learn about thecpu_set_t
type, theCPU_ZERO
andCPU_SET
macros, and thesched_setaffinity
function). - Implement the
markWorkersAsAvailable
handler, which gets invoked to handle all pendingSIGCHLD
signals. Callwaitpid
to surface the pid of the child that recently self-halted, and mark it as available. - Implement a
getAvailableWorker
helper function, which you’ll use to decompose thebroadcastNumbersToWorkers
function in the next step. You should never busy wait. Instead, investigatesigsuspend
(by typingman
sigsuspend
) as a way of blocking indefinitely until at least one worker is available. - Flesh out the implementation of
broadcastNumbersToWorkers
. I’m giving you a tiny hint here – thatbroadcastNumbersToWorkers
keeps on looping until EOF is detected. You can restart a stopped process by sending it aSIGCONT
signal.
static void broadcastNumbersToWorkers() {
while (true) {
string line;
getline(cin, line);
if (cin.fail()) break;
size_t endpos;
/* long long num = */ stoll(line, &endpos);
if (endpos != line.size()) break;
// you shouldn't need all that many lines of additional code
}
}
- Implement
waitForAllWorkers
, which does more or less what it says – it waits for all workers to self-halt and become available. - Last but not least, implement the
closeAllWorkers
routine to uninstall theSIGCHLD
handler and restore the default (investigate theSIG_DFL
constant), coax all workers to exit by sending them EOFs, and then wait for them to all exit.
Your implementation should not make any invalid memory accesses or cause any segfaults, and nothing you write should orphan any memory.
Submitting your work
Once you’re done, you should run ./tools/sanitycheck
all of your work as you
normally would and then run ./tools/submit
.
Extra credit
If you submitted the Week 2 survey by Sunday, July 8, 11:59pm, run the following in the root of your assign2 directory:
echo "I completed the Week 2 survey." >> NOTE_TO_GRADER
git add NOTE_TO_GRADER
If you ls
, you should see NOTE_TO_GRADER
alongside pipeline.c
,
subprocess.cc
, etc. Then, rerun ./tools/submit
.
We aren’t verifying that you actually did submit the survey, but please uphold the Honor Code and be truthful in your response.