Lab 6: ThreadPool and Networking

This handout was adapted from Jerry Cain’s Spring 2018 offering.

Problem 1: `ThreadPool` Thought Questions

Presented below are partial implementations of my own ThreadPool::wait and my ThreadPool::worker method.

void ThreadPool::wait() {
  lock_guard<mutex> lg(lock);
  done.wait(lock, [this]{ return count == 0; });
}

void ThreadPool::worker(size_t workerID) {
  while (true) {
    // code here waits for a thunk to be supplied and then executes it
    lock_guard<mutex> lg(lock);
    if (--count == 0) done.notify_all();
  }
}

Briefly describe a simple ThreadPool test program that would have deadlocked had ThreadPool::worker called done.notify_one instead of done.notify_all.
Assume ThreadPool::worker gets through the call to done.notify_all and gets swapped off the processor immediately after the lock_guard is destroyed. Briefly describe a situation where a thread that called ThreadPool::wait still won’t advance past the done.wait call.
Had done.notify_all been called unconditionally (i.e. not just because count was zero), the ThreadPool would have still worked correctly. Why is this true, and why is the if test still the right thing to include?

Problem 2: Networking, Client/Server, Request/Response

Explain the differences between a pipe and a socket.
Describe how networking is just another form of function call and return. What “function” is being called? What are the parameters? And what’s the return value? Assume HTTP is the operative protocol.
Describe the network architecture needed for:
- Email servers and clients
- Peer-to-Peer Text Messaging via cell phone numbers
- Skype
As it turns out, each of the three network applications above all make use of custom protocols. Which ones could have relied on HTTP and/or HTTPS instead of custom protocols?

Problem 3: Web Servers and Python Scripts

It’s high time you implement your own multithreaded HTTP web server, making use of a ThreadPool to manage the concurrency, and all of the multiprocessing functions (fork and all related functions) to launch another executable – in all cases, a Python script – to ingest the majority of the HTTP request from the client and publish the entire HTTP response back to it.

Here are the pertinent details:

The web server listens to port 80 like all normal web servers do.
Every single HTTP request is really a request to invoke a python script. For instance, if the first line of a request is GET /add.py?one=11&two=50 HTTP/1.0, the web server executes the {"python", "./scripts/add.py", "GET", "/add.py?one=11&two=50", "HTTP/1.0"} argument vector.
All python scripts are housed within the scripts directory. GET /one.py invokes ./scripts/one.py, GET /deep/down/below.py invokes ./scripts/deep/down/below.py, and so forth.
All python scripts expect to read the entire HTTP request (save for the first line) in through standard in, and the entire HTTP response is published via standard out.
All python scripts accept three arguments, which are the tokens that make up the first line of the HTTP request. The web server itself must read in the first line from the client, tokenize it so it knows what to pass as arguments to the relevant python script, and then assume the python script pulls everything else.
If the URL identifies a resource whose name doesn’t end up .py, then your server can simply ignore the request and close the connection.
Similarly, if the URL identifies a resource whose name ends in .py, but the python script doesn’t exist, your web server can ignore the request and close the connection.
If the URL identifies a resource whose name ends in .py but includes .., your web server can ignore the request and close the connection. (This is presumably an attempt to reach up above the scripts directory and gain access to an arbitrary executable.)
You may not use the system function. You must use fork, execvp, and other related functions.
You may assume that all python scripts successfully invoked always succeed, so you needn’t check any status codes via WIFEXITED, WEXITSTATUS, etc.
Your implementation must close all unused descriptors and cull all zombie processes.
Your implementation should not make use of any signal handlers.

Your implementation can make use of the following routine, which reads the first line of an HTTP request coming over the provided client socket (up to and including the \r\n), and surfaces the method (e.g. GET), the full URL (/add.py?one=11&two=50), the path (/add.py), and the protocol (HTTP/1.1) through the four strings shared by reference.

static void parseRequest(int client, string& method, string& url, string& path, string& protocol);

You’re to complete the implementation of the web server by fleshing out the details of the handleRequest function.

int main() {
   int server = createServerSocket(/* port = */ 80);
   runServer(server);
   return 0; // never gets here, but compiler doesn't know that
}
 
static const kNumThreads = 16;
static void runServer(int server) {
  ThreadPool pool(kNumThreads);
  while (true) {
    int client = accept(server, NULL, NULL);
    pool.schedule([client] { handleRequest(client); });
  }
}

static void handleRequest(int client) {

CS 110

Lab 6: ThreadPool and Networking

Problem 1: ThreadPool Thought Questions

Problem 2: Networking, Client/Server, Request/Response

Problem 3: Web Servers and Python Scripts

Problem 1: `ThreadPool` Thought Questions