Lab 3: Sorting with Processes and Threads

This handout was adapted from Jerry Cain’s Spring 2018 offering.

I’ve created a Slack channel for Lab 3 discussion (aptly named #lab3), and all students (but particularly remote students) are encouraged to share their ideas there.

Getting started

Before starting, go ahead and clone the lab3 folder:

$ git clone /usr/class/cs110/repos/lab3/shared lab3
$ cd lab3
$ make

Problem 1: Analyzing parallel mergesort

Consider the architecturally interesting portion of the mergesort executable, which launches 128 peer processes to cooperatively sort an array of 128 randomly generated numbers. The implementations of createSharedArray and freeSharedArray are omitted for the time being.

static bool shouldKeepMerging(size_t start, size_t reach, size_t length) {
  return start % reach == 0 && reach <= length;
}

static void repeatedlyMerge(int numbers[], size_t length, size_t start) {
  int *base = numbers + start;
  for (size_t reach = 2; shouldKeepMerging(start, reach, length); reach *= 2) {
    raise(SIGTSTP);
    inplace_merge(base, base + reach/2, base + reach);
  }
  exit(0);
}

static void createMergers(int numbers[], pid_t workers[], size_t length) {
  for (size_t start = 0; start < length; start++) {
    workers[start] = fork();
    if (workers[start] == 0) 
      repeatedlyMerge(numbers, length, start);
  }
}

static void orchestrateMergers(int numbers[], pid_t workers[], size_t length) {
  size_t step = 1;
  while (step <= length) {
    for (size_t start = 0; start < length; start += step) 
      waitpid(workers[start], NULL, WUNTRACED);
    step *= 2;
    for (size_t start = 0; start < length; start += step) 
      kill(workers[start], SIGCONT);
  }
}

static void mergesort(int numbers[], size_t length) {
  pid_t workers[length];
  createMergers(numbers, workers, length);
  orchestrateMergers(numbers, workers, length);
}

static const size_t kNumElements = 128;
int main(int argc, char *argv[]) {
  for (size_t trial = 1; trial <= 10000; trial++) {
    int *numbers = createSharedArray(kNumElements);    
    mergesort(numbers, kNumElements);
    assert(is_sorted(numbers, numbers + kNumElements));
    freeSharedArray(numbers, kNumElements);
  }
  return 0;
}

The program presented above is a nod to concurrent programming and whether parallelism can reduce the asymptotic running time of an algorithm. (I use the term “asymptotic running time” rather loosely; that generally refers to the behavior as the size of the input approaches infinity, and, usually, we don’t have infinite resources. But for the purposes of this problem, pretend like we do.) We’ll lead you through a series of short questions – some easy, some less easy – to test your multiprocessing and signal chops and to understand why the “asymptotic” running time of an algorithm can sometimes be improved in a parallel programming world.

For reasons I’ll discuss shortly, this above program works because the address in the numbers variable is cloned across the 128 fork calls, and this particular address maps to the same set of physical addresses in all 128 processes (and that’s different than what usually happens).

The program successfully sorts any array of length 128 by relying on 128 independent processes. (Again, assume infinite resources, so assume we have as many CPU cores as we do elements in the array.) In a nutshell, the above program works because:

For this lab problem, we want to lead you through a series of short answer questions to verify a deeper understanding of how the entire mergesort system works. Truth be told, the mergesort algorithm we’ve implemented is more of theoretical interest than practical. But it’s still a novel example of parallel programming that rings much more relevant and real-world than the Disneyland example I presented in lecture.

Use the following short answer questions to guide the discussion. (Note: this entire problem is based on a final exam question from a prior quarter.)

The createSharedArray function (defined in memory.h and memory.cc in your lab3 repo) sets aside space for an array of 128 (or, more generally, length) integers and seeds it with random numbers. It does so using the mmap function you’ve seen in Assignment 1 and 2, and you’ll also saw it a bunch of times while playing with strace last week during discussion section.

static int *createSharedArray(size_t length) {
    int *numbers =
        static_cast<int *>(mmap(NULL, length * sizeof(int), PROT_READ | PROT_WRITE,
                                MAP_SHARED | MAP_ANONYMOUS, -1, 0));
    
    static RandomGenerator rgen;
    for (size_t i = 0; i < length; i++) 
        numbers[i] = rgen.getNextInt(kMinValue, kMaxValue);
    return numbers;
}

The mmap function takes the place of malloc here, because it sets up space not in the heap, but in an undisclosed segment that other processes can see and touch (that’s what MAP_ANONYMOUS and MAP_SHARED mean).

Problem 2: Multithreaded quicksort

quicksort is an efficient, divide-and-conquer sorting algorithm whose traditional implementation looks like this:

static void quicksort(vector<int>& numbers, ssize_t start, ssize_t finish) {
  if (start >= finish) return;
  ssize_t mid = partition(numbers, start, finish);
  quicksort(numbers, start, mid - 1);
  quicksort(numbers, mid + 1, finish);
}

static void quicksort(vector<int>& numbers) {
  quicksort(numbers, 0, numbers.size() - 1);
}

The details of how partition works aren’t important. All you need to know is that a call to partition(numbers, start, finish) reorders the elements between numbers[start] and numbers[finish], inclusive, so that numbers residing within indices start through mid - 1, inclusive, are less than or equal to the number at index mid, and that all numbers residing in indices mid + 1 through stop, inclusive, are strictly greater than or equal to the number at index mid. As a result of this reorganization, we know that, once partition returns, the number residing at index mid actually belogs there in the final sort.

What’s super neat is that the two recursive calls to quicksort can execute in parallel, since the sequences they operate on don’t overlap. In fact, to make sure you get some practice with C++ threads right away, you’re going to cannibalize the above implementation so that each call to quicksort spawns off threads to recursively sort the front and back portions at the same time.

Mini problem 3: The /proc directory

Note: This problem isn’t important, but I think it’s fascinating. It demonstrates how the /proc virtual directory can be explored to view information about running processes on your machine.