Lab 6 Solutions

This handout was adapted from Jerry Cain’s Spring 2018 offering.

Problem 1: `ThreadPool` Thought Questions

Presented below are partial implementations of my own ThreadPool::wait and my ThreadPool::worker method.

void ThreadPool::wait() {
  lock_guard<mutex> lg(lock);
  done.wait(lock, [this]{ return count == 0; });
}

void ThreadPool::worker(size_t workerID) {
  while (true) {
    // code here waits for a thunk to be supplied and then executes it
    lock_guard<mutex> lg(lock);
    if (--count == 0) done.notify_all();
  }
}

Briefly describe a simple ThreadPool test program that would have deadlocked had ThreadPool::worker called done.notify_one instead of done.notify_all.
- Allocate ThreadPool of size 1 on main thread, schedule []{ sleep(10); } on main thread, create two standalone threads that call ThreadPool::wait() and call join on both. Standalone threads descend into done.wait(), only one notified, other sleeps and never joins.
Assume ThreadPool::worker gets through the call to done.notify_all and gets swapped off the processor immediately after the lock_guard is destroyed. Briefly describe a situation where a thread that called ThreadPool::wait still won’t advance past the done.wait call.
- ThreadPool::schedule is called before thread in done.wait() gets processor, acquires mutex, and reevaluates condition. ThreadPool::schedule increments count, so pathway for condition to fail exists.
Had done.notify_all been called unconditionally (i.e. not just because count was zero), the ThreadPool would have still worked correctly. Why is this true, and why is the if test still the right thing to include?
- Just because a thread wakes prematurely doesn’t mean it’ll rise from done.wait(). It needs to meet the supplied condition, and very often it won’t. The if test identifies situations where the chances the condition will be met are very good, thereby making better use of the CPU.

Problem 2: Networking, Client/Server, Request/Response

Explain the differences between a pipe and a socket.
- Fundamentally, a pipe is a unidirectional communication channel and a socket is a bidirectional one. Pipes also are only used to communicate within a given system while sockets are used to communicate over IP, almost always between different hosts. Finally, pipes must be created before a process is forked in order to facilitate interprocess communication, whereas sockets can be created at any time.
Describe how networking is just another form of function call and return. What “function” is being called? What are the parameters? And what’s the return value? Assume HTTP is the operative protocol.
- The client requires some computation to be performed in another context, and in this case that context is provided on another machine as opposed to some other function on the same machine. The function being called is the URL (where the function lives, and which particular service is relevant, e.g. http://cs110.stanford.edu/cgi-bin/gradebook), the parameters are expressed via text passed from client to server, and the return value is expressed via text passed from server to client.
Describe the network architecture needed for:
- Email servers and clients
  - Most email clients and servers speak IMAP over a secure connection. IMAP is similar to HTTP, except that the request and response protocol is optimized for the selection of a mailbox, a digest of all emails in that mailbox, the ability to create and delete mailboxes, the ability to mark an email as read, and the ability, of course, to send an email. (Curious how you can securely telnet to, say, imap.gmail.com? Read this.)
- Peer-to-Peer Text Messaging via cell phone numbers
  - By default, the cell service provider intercepts all messages via a centralized farm of servers and forwards messages (with images, emoji, etc) on to the intended recipient. In some cases (e.g. two iPhones in conversation over wifi), Apple mediates instead of, say, Verizon. For an accessible introduction to the actual protocol used by early SMS implementations, read this.
- Skype
  - Same principle as SMS/text messaging, except that persistent connections between clients need to be maintained. This Wikipedia segment does a nice job explaining what Skype does, without going in to the weeds. If you like going into weeds, then this is a really, really well written technical piece explaining how it all works. If you take CS144, this last article is a reading assignment.
As it turns out, each of the three network applications above all make use of custom protocols. Which ones could have relied on HTTP and/or HTTPS instead of custom protocols?
- Even though it might have been cumbersome, all of them could have. HTTP/HTTPS is a fairly generic grammar that allows side effects, and everything needed for email, SMS, and video chat could, in principle, be codified via HTTP. However, custom protocols are generally constructed to optimize for common operations (as with email messages that need to be deleted) and/or the need for persistent connections (as with video conferencing).

Problem 3: Web Servers and Python Scripts

It’s high time you implement your own multithreaded HTTP web server, making use of a ThreadPool to manage the concurrency, and all of the multiprocessing functions (fork and all related functions) to launch another executable – in all cases, a Python script – to ingest the majority of the HTTP request from the client and publish the entire HTTP response back to it.

Here are the pertinent details:

The web server listens to port 80 like all normal web servers do.
Every single HTTP request is really a request to invoke a python script. For instance, if the first line of a request is GET /add.py?one=11&two=50 HTTP/1.0, the web server executes the {"python", "./scripts/add.py", "GET", "/add.py?one=11&two=50", "HTTP/1.0"} argument vector.
All python scripts are housed within the scripts directory. GET /one.py invokes ./scripts/one.py, GET /deep/down/below.py invokes ./scripts/deep/down/below.py, and so forth.
All python scripts expect to read the entire HTTP request (save for the first line) in through standard in, and the entire HTTP response is published via standard out.
All python scripts accept three arguments, which are the tokens that make up the first line of the HTTP request. The web server itself must read in the first line from the client, tokenize it so it knows what to pass as arguments to the relevant python script, and then assume the python script pulls everything else.
If the URL identifies a resource whose name doesn’t end up .py, then your server can simply ignore the request and close the connection.
Similarly, if the URL identifies a resource whose name ends in .py, but the python script doesn’t exist, your web server can ignore the request and close the connection.
If the URL identifies a resource whose name ends in .py but includes .., your web server can ignore the request and close the connection. (This is presumably an attempt to reach up above the scripts directory and gain access to an arbitrary executable.)
You may not use the system function. You must use fork, execvp, and other related functions.
You may assume that all python scripts successfully invoked always succeed, so you needn’t check any status codes via WIFEXITED, WEXITSTATUS, etc.
Your implementation must close all unused descriptors and cull all zombie processes.
Your implementation should not make use of any signal handlers.

Your implementation can make use of the following routine, which reads the first line of an HTTP request coming over the provided client socket (up to and including the \r\n), and surfaces the method (e.g. GET), the full URL (/add.py?one=11&two=50), the path (/add.py), and the protocol (HTTP/1.1) through the four strings shared by reference.

static void parseRequest(int client, string& method, string& url, string& path, string& protocol);

You’re to complete the implementation of the web server by fleshing out the details of the handleRequest function.

int main() {
   int server = createServerSocket(/* port = */ 80);
   runServer(server);
   return 0; // never gets here, but compiler doesn't know that
}
 
static const kNumThreads = 16;
static void runServer(int server) {
  ThreadPool pool(kNumThreads);
  while (true) {
    int client = accept(server, NULL, NULL);
    pool.schedule([client] { handleRequest(client); });
  }
}

static void handleRequest(int client) {
   // everything below represents a reasonable solution that meets all requirements
   string method, url, path, protocol;
   parseRequest(client, method, url, path, protocol);
   if (!endsWith(path, ".py") || path.find("..") != string::npos) { 
       close(client);
       return;
   } 
   pid_t pid = fork();
   if (pid == 0) {
      dup2(client, STDIN_FILENO);
      dup2(client, STDOUT_FILENO);
      close(client);
      string script = "./scripts" + path;
      const char *argv[] = {
         "python", script.c_str(), method.c_str(), url.c_str(), protocol.c_str(), NULL
      };
      execvp(argv[0], const_cast<char **>(argv));
      exit(0);
    }
    close(client);
    waitpid(pid, NULL, 0);
}

CS 110

Lab 6 Solutions

Problem 1: ThreadPool Thought Questions

Problem 2: Networking, Client/Server, Request/Response

Problem 3: Web Servers and Python Scripts

Problem 1: `ThreadPool` Thought Questions