Week 8 Exercises: Sharing Data by Communicating

This assignment was written by Armin Namavari.

In this week’s exercises, you’ll get to appreciate the sleekness of channels, a concurrency abstraction you learned about this week.

You should have received an invite to join this week’s Github repository. If you didn’t get an email invite, try going to this link:

https://github.com/cs110l/week8-YOURSUNETID

You can download the code using git as usual:

git clone https://github.com/cs110l/week8-YOURSUNETID.git week8

Due date: Tuesday, May 25, 11:59pm (Pacific time)

Ping us on Slack if you are having difficulty with this assignment. We would love to help clarify any misunderstandings, and we want you to sleep!

Part 1: parallel_map

A map function takes a vector and applies a transformation to each element, returning a vector of transformed elements. For example:

fn double(x: i32) -> i32 { return x * 2; }
let numbers = vec![1, 2, 3];
let doubled_numbers = map(numbers, double);
// -> doubled_numbers = [2, 4, 6]

// Often used with closure functions:
let strings = vec!["  hello ", " world  "];
let trimmed = map(strings, |s| s.trim());
// -> trimmed = ["hello", "world"]

map functions are very widely used in many programming languages. You decide to share the joys of parallelism with your friends who haven’t learned about multithreading yet by implementing a special speedy map function implemented with threads.

This function takes two arguments: a vector of type Vec<T> and a function f which takes elements of type T as input and returns type U as output. It runs f on each input element in the input vector and collects the results in an output vector. Even better, it does this in parallel! The function looks something like this:

fn parallel_map<T, U, F>(mut input_vec: Vec<T>, num_threads: usize, f: F) -> Vec<U> 
    where F: FnOnce(T) -> U  + Send + Copy + 'static,
          T: Send + 'static,
          U: Send + 'static + Default, {
    let mut output_vec: Vec<U> = Vec::with_capacity(input_vec.len());
    // TODO: in parallel, run f on each input element and collect the outputs,
    // in order, in output_vec
    output_vec
}

Ok(reader), take a deep breath. There are a lot of trait shenanigans going on over here:

In summary, parallel_map takes in input_vec (as an owned type, so it can be consumed), num_threads (the number of threads that can execute in parallel), and a function f that takes as input values of type T and returns values of type U. An vector of U is returned.

Your objective is to complete this implementation by using channels as your only synchronization mechanism. This might sound like a limitation, but trust me, this will make your life easier. You can implement a second version using mutexes and condition variables if you want to fully appreicate how nice it is to use channels.

In Lecture 15, we showed how you can use channels to implement farm v3.0. Please make sure you understand that example before you embark on implementing parallel_map.

As is often the case with concurrency, your solution doesn’t need to be very long (our solution is 43 lines of code long) but that doesn’t mean it’s trivial. You should carefully design your implementation before you code it up.

How you design parallel_map is completely up to you! You are free to use as many channels as you need and design the messages you send across those channels (of course you should strive for an implementation that is simple, correct, and efficient). As you’re planning out your implementation, you should keep the following things in mind:

(Optional) Feel free to do something fun with the parallel_map implementation – use it to revamp the link explorer lecture example. Use it to implement a parallelized Mandelbrot Set generator. It’s a very versatile function – the possibilities are endless!

Part 2: Weekly Survey

Please let us know how you’re doing using this survey.

When you have submitted the survey, you should see a password. Put this code in survey.txt before submitting.

Optional Extensions

The parallel_map function you implemented effectively spins up a ThreadPool, uses it to execute the maps, and then destroys the ThreadPool. Implement a proper ThreadPool that only destroys its worker threads when dropped and give it a parallel_map function as well that accomplishes what you did in Part 1.

If you thought parallel_map was fun, wait till you hear about parallel_reduce. Suppose you have some commutative aggregation function – say + for example. If you wanted to add up the numbers in a Vec of size 8, you could do it the boring way – by taking a linear pass through the vector and accumulating the sum. Or you could do something like this:

1 + 2 + 3 + 4 + 5 + 6 + 7 + 8
\   /   \   /   \   /   \   /
  3       7       11      15

where the sums (1 + 2), (3 + 4), (5 + 6), (7 + 8) are all done in parallel, then you do another round of parallel sums – this time (3 + 7) and (11 + 15). And then you do one final sum to get your result. This is precisely what a parallel_reduce implementation should do. This presents some new synchronization challenges, although your parallel_map implementation should serve as a good starting point. Better yet, you can tack this parallel_reduce function onto your ThreadPool implementation, and now you’ve got yourself a really fancy ThreadPool.

Some CS161 food for thought – what would the asymptotic runtime of parallel_reduce be if you had infinite parallelism and each binary operation was O(1)? What if you could run M threads at once? What if each binary operation took time proprotional to the number of elements it aggregated over? What if each binary operation took time that varied according to a geometric distribution with success probability… jk haha.