Assignment 2: All Things Multiprocessing

This handout was adapted from Jerry Cain’s Spring 2018 offering.

We’ve progressed through a good amount of material involving processes and process communication. Rather than building one large program, I’d like you to code up a few different things with the idea that you’ll learn more by tackling multiple problems and leveraging your understanding of the material in multiple domains. All of these programs should be coded directly within a single repository, which you can get by typing the following:

git clone /usr/class/cs110/repos/assign2/$USER assign2

There are three problems in total; the first two exercise your understanding of pipes, and the last one also requires an understanding of signals.

Due Date: Friday, July 13, 2018 at 11:59pm

A note on assignment timing

This is supposed to be a one-week assignment starting Friday, 7/6. I am releasing it a little bit early, because I know a handful of SCPD students have some time off before Friday and would like to work on the assignment before they’re busy with work again. However, you should not feel pressured to start early. In fact, I think it might be better to not start early, because there is a bit of lecture material (surrounding the dup2 system call) that I did not get to cover on Monday, and our discussion will probably help you a lot on Problem 1. If you do decide to start early, you should read through the last example from Monday’s lecture notes. I apologize for the mess, but since I mentioned in lecture that I would release it early (before I realized I was going to run out of time), I’m still releasing it early.

Problem 1: Implementing a `pipeline` in C

Your first task is to implement the pipeline function. This pipeline function accepts two argument vectors, and assuming both vectors are legit, spawns off twin processes with the added bonus that the standard output of the first is directed to the standard input of the second. Here’s the interface you’re coding to:

void pipeline(char *argv1[], char *argv2[], pid_t pids[]);

For simplicity, you can assume that all calls to pipeline are well-formed and will work as expected. In other words, argv1 and argv2 are each valid, NULL-terminated argument vectors, and that pids is the base address of an array of length two. Further assume that all calls to pipe, dup2, close, execvp, and so forth succeed so that you needn’t do any error checking whatsoever. pipeline should return without waiting for either of the child processes to finish (i.e. your pipeline implementation should not call waitpid anywhere), and the ids of the daisy-chained processes are dropped into pids[0] and pids[1]. Also, ensure that the two processes are running in parallel as much as possible, so that pipeline({"sleep", "10", NULL}, {"sleep", "10", NULL}, pids) takes about 10 seconds instead of 20.

You should place your implementation of pipeline in pipeline.c. There is no sanitycheck for pipeline, but you can rely on pipeline-test.c and the pipeline-test executable it compiles to exercise your implementation. The pipeline-test.c test harness I start you off with is small, so you should add more tests of your own to prove that your pipeline is coded to specification. (We will not grade your pipeline-test.c – this is purely for your own testing purposes.)

Note that this first problem is standalone and doesn’t contribute to anything else that follows (although the concept of a pipeline will come back in Assignment 3).

Problem 2: Implementing `subprocess` in C++

Your next task is to implement an even more flexible subprocess than that implemented in lecture. The most important part of the subprocess.h interface file is right here:

static const int kNotInUse = -1;
struct subprocess_t {
  pid_t pid;
  int supplyfd;
  int ingestfd;
};

/**                                                                                                                                                              
 * Function: subprocess
 * --------------------
 * Creates a new process running the executable identified via argv[0].
 *
 *   argv: the NULL-terminated argument vector that should be passed
 *         to the new process's main function
 *   supplyChildInput: true if the parent process would like to pipe content
 *                     to the new process's stdin, false otherwise
 *   ingestChildOutput: true if the parent would like the child's stdout to be
 *                      pushed to the parent, false otheriwse
 *   openfds: a set<int> of additional file descriptors known to be open in the
 *            parent that should be closed in the child process.
 *            This set should never include file descriptors 0, 1, or 2.
 */
subprocess_t subprocess(char *argv[],
                        bool supplyChildInput, bool ingestChildOutput,
                        const std::set<int>& openfds = std::set<int>()) throw (SubprocessException);

Read through the subprocess.h documentation to see how this new subprocess should work, and place your implementation in subprocess.cc. Should any of the system calls needed to implement your subprocess routine fail (either in the parent or in the child), you should throw a SubprocessException around an actionable error message. Inspect the subprocess-exceptions.h file for the full, inlined definition.

Use the test harness supplied by subprocess-test.cc to exercise your implementation, and by all means add to the subprocess-test.cc file to ensure that your implementation is bulletproof. When looking at subprocess-test.cc, you’ll also get a little bit of a reminder how try/catch blocks work. You’ll want to add your own tests to subprocess-test.cc to ensure that all the (true, true), (true, false), (false, true), and (false, false) combinations for (supplyChildInput, ingestChildOutput) all work as expected.

Note that your implementation here is formally C++, since the larger exercise that follows this one is also to be written in C++, and it needs to link against your subprocess implementation without drama. We’re switching to C++ pretty much from this problem forward, because C++ provides better support for strings and generics than C does. C++ also provided native support for some threading and concurrency directives we’ll be relying on a few weeks, and I’d rather ease you into the language now than do so when we branch into the multithreading topic. Truth be told, your C++ implementation of subprocess will look as it would have in pure C, save for the fact that you’re throwing C++ exceptions to identify errors.

Your fully functional subprocess routine is used by the starter code I’ve given you for the final exercise (the one requiring you implement the prime factorization farm). Do not start farm until you are confident in your subprocess implementation.

Problem 3: Implementing `farm` in C++

Your final challenge is to harness the power of a computer’s multiple cores to manage a collection of executables, each running in parallel to contribute its share to a larger result. For the purposes of this problem, we’re going to contrive a scenario where the computation of interest – the prime factorization of arbitrarily large numbers – is complex enough that some factorizations take multiple seconds or even minutes to compute. The factorization algorithm itself isn’t the focus here, save for the fact that it’s potentially time consuming, and that should we need to compute multiple prime factorizations, we should leverage the computing resources of our trusty myth cluster to multiprocess and generate output more quickly.

Consider the following Python program called factor.py:

self_halting = len(sys.argv) > 1 and sys.argv[1] == '--self-halting'
pid = os.getpid()
while True:
   if self_halting: os.kill(pid, signal.SIGSTOP)
   try: num = int(raw_input()) # raw_input blocks, eventually returns a single line from stdin
   except EOFError: break; # raw_input throws an EOFError when EOF is detected
   start = time.time()
   response = factorization(num) 
   stop = time.time()
   print ' %s [pid: %d, time: %g seconds]' % (response, pid, stop - start)

You really don’t need to know Python to understand how it works, because every line of this particular program has a clear C or C++ analog. The primary things I’ll point out are:

Python’s print operates just like C’s printf (and it’s even process-safe)
raw_input reads and returns a single line of text from standard input, blocking indefinitely until a line is supplied (chomping the '\n') or until end-of-file is detected
factorization is something I wrote; it takes an integer (e.g. 12345678) and returns the prime factorization (e.g. 12345678 = 2 * 3 * 3 * 47 * 14593) as a string. You’ll see it when you open up factor.py in your favorite text editor.
The os.kill line prompts the script to stop itself (but only if the script is invoked with the '--self-halting' flag) and wait for it to be restarted via SIGCONT

The following should convince you our script does what you’d expect:

$ printf "1234567\n12345678\n" | ./factor.py
1234567 = 127 * 9721 [pid: 28598, time: 0.0561171 seconds]
12345678 = 2 * 3 * 3 * 47 * 14593 [pid: 28598, time: 0.512921 seconds]
$ time printf "1234567\n12345678\n123456789\n1234567890\n" | ./factor.py
1234567 = 127 * 9721 [pid: 28601, time: 0.0521989 seconds]
12345678 = 2 * 3 * 3 * 47 * 14593 [pid: 28601, time: 0.517912 seconds]
123456789 = 3 * 3 * 3607 * 3803 [pid: 28601, time: 5.18094 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 28601, time: 51.763 seconds]
real    0m57.535s
user    0m57.516s
sys    0m0.004s
$ printf "1001\n10001\n" | ./factor.py --self-halting
$ kill -CONT %1
1001 = 7 * 11 * 13 [pid: 28625, time: 0.000285149 seconds]
$ kill -CONT %1
10001 = 73 * 137 [pid: 28625, time: 0.00222802 seconds]
$ kill -CONT %1
$ kill -CONT %1
-bash: kill: (28624) - No such process
$ time printf "123456789\n123456789\n" | ./factor.py
123456789 = 3 * 3 * 3607 * 3803 [pid: 28631, time: 5.1199 seconds]
123456789 = 3 * 3 * 3607 * 3803 [pid: 28631, time: 5.1183 seconds]
real    0m10.260s
user    0m10.248s
sys    0m0.008s

This last test may look silly, but it certainly verifies that one process is performing the same factorization twice, in sequence, so that the overall running time is roughly twice the time it takes to compute the factorization the first time (no caching here, so the second factorization does it all over again).

My factorization function runs in O(n) time, so it’s very slow for some large inputs. Should you need to compute the prime factorizations of many large numbers, the factor.py script would get the job done, but it may take a while. If, however, you’re ssh‘ed into a machine that has multiple processors and/or multiple cores (each of the myths has eight cores), you can write a program that manages several processes running factor.py and tracks which processes are idle and which processes are deep in thoughtful number theory.

You’re going to write a program – a C++ program called farm – that can run on the myths to leverage the fact that you have eight cores at your fingertips. farm will spawn several workers – one for each core, each running a self-halting instance of factor.py, read an unbounded number of positive integers (one per line, no error checking required), forward each integer on to an idle worker (blocking until one or more workers is ready to read the number), and allow all of the workers to cooperatively publish all prime factorizations to standard output (without worrying about the order in which they’re printed). To illustrate how farm should work, check out the following test case:

$ time printf "1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n1234567890\n" | ./farm
There are this many CPUs: 8, numbered 0 through 7.
Worker 4245 is set to run on CPU 0.
Worker 4246 is set to run on CPU 1.
Worker 4247 is set to run on CPU 2.
Worker 4248 is set to run on CPU 3.
Worker 4249 is set to run on CPU 4.
Worker 4250 is set to run on CPU 5.
Worker 4251 is set to run on CPU 6.
Worker 4252 is set to run on CPU 7.
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4249, time: 95.5286 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4252, time: 95.5527 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4245, time: 95.5824 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4247, time: 95.5851 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4248, time: 95.6578 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4250, time: 95.6627 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4251, time: 95.6666 seconds]
1234567890 = 2 * 3 * 3 * 5 * 3607 * 3803 [pid: 4246, time: 96.254 seconds]
real    1m36.285s
user    12m42.668s
sys    0m0.128s

Note that each of eight processes took about the same amount of time to compute the identical prime factorization, but because each was assigned to a different core, the real (aka perceived) time is basically the time it took to compute the factorization just once. How’s that for parallelism!

Note that prime factorizations aren’t required to be published in order – that makes this all a little easier – and repeat requests for the same prime factorization are all computed from scratch.

Your farm.cc implementation will make use of the following C++ record, global constants, and global variables:

struct worker {
   worker() {}
   worker(char *argv[], const set<int>& openfds) : sp(subprocess(argv, true, false, openfds)), available(false) {}
   subprocess_t sp;
   bool available;
};

static const size_t kNumCPUs = sysconf(_SC_NPROCESSORS_ONLN);
static vector<worker> workers(kNumCPUs);
static size_t numWorkersAvailable;

The main function we give you includes stubs for all of the helper functions that decompose it, and that main function looks like this:

int main(int argc, char *argv[]) {
   signal(SIGCHLD, markWorkersAsAvailable);
   spawnAllWorkers();
   broadcastNumbersToWorkers();
   waitForAllWorkers();
   closeAllWorkers();
   return 0;
}

This final problem can be tricky, but it’s perfectly manageable provided you follow this road map:

First implement spawnAllWorkers, which spawns a self-halting factor.py process for each core and updates the global workers vector so that each worker contains the relevant subprocess_t allowing farm.cc to monitor it and pipe prime numbers to it. You can assign a process to always execute on a particular core by leveraging functionality outlined in the CPU_SET and sched_setaffinity man pages (i.e. type in man CPU_SET to learn about the cpu_set_t type, the CPU_ZERO and CPU_SET macros, and the sched_setaffinity function).
Implement the markWorkersAsAvailable handler, which gets invoked to handle all pending SIGCHLD signals. Call waitpid to surface the pid of the child that recently self-halted, and mark it as available.
Implement a getAvailableWorker helper function, which you’ll use to decompose the broadcastNumbersToWorkers function in the next step. You should never busy wait. Instead, investigate sigsuspend (by typing man sigsuspend) as a way of blocking indefinitely until at least one worker is available.
Flesh out the implementation of broadcastNumbersToWorkers. I’m giving you a tiny hint here – that broadcastNumbersToWorkers keeps on looping until EOF is detected. You can restart a stopped process by sending it a SIGCONT signal.

static void broadcastNumbersToWorkers() {
  while (true) {
    string line;
    getline(cin, line);
    if (cin.fail()) break;
    size_t endpos;
    /* long long num = */ stoll(line, &endpos);
    if (endpos != line.size()) break;
    // you shouldn't need all that many lines of additional code
  }
}

Implement waitForAllWorkers, which does more or less what it says – it waits for all workers to self-halt and become available.
Last but not least, implement the closeAllWorkers routine to uninstall the SIGCHLD handler and restore the default (investigate the SIG_DFL constant), coax all workers to exit by sending them EOFs, and then wait for them to all exit.

Your implementation should not make any invalid memory accesses or cause any segfaults, and nothing you write should orphan any memory.

Submitting your work

Once you’re done, you should run ./tools/sanitycheck all of your work as you normally would and then run ./tools/submit.

Extra credit

If you submitted the Week 2 survey by Sunday, July 8, 11:59pm, run the following in the root of your assign2 directory:

echo "I completed the Week 2 survey." >> NOTE_TO_GRADER
git add NOTE_TO_GRADER

If you ls, you should see NOTE_TO_GRADER alongside pipeline.c, subprocess.cc, etc. Then, rerun ./tools/submit.

We aren’t verifying that you actually did submit the survey, but please uphold the Honor Code and be truthful in your response.

CS 110