Lab 6: ThreadPool and Networking
This handout was adapted from Jerry Cain’s Spring 2018 offering.
Problem 1: ThreadPool Thought Questions
Presented below are partial implementations of my own ThreadPool::wait and my
ThreadPool::worker method.
void ThreadPool::wait() {
lock_guard<mutex> lg(lock);
done.wait(lock, [this]{ return count == 0; });
}
void ThreadPool::worker(size_t workerID) {
while (true) {
// code here waits for a thunk to be supplied and then executes it
lock_guard<mutex> lg(lock);
if (--count == 0) done.notify_all();
}
}
- Briefly describe a simple
ThreadPooltest program that would have deadlocked hadThreadPool::workercalleddone.notify_oneinstead ofdone.notify_all. - Assume
ThreadPool::workergets through the call todone.notify_alland gets swapped off the processor immediately after thelock_guardis destroyed. Briefly describe a situation where a thread that calledThreadPool::waitstill won’t advance past thedone.waitcall. - Had
done.notify_allbeen called unconditionally (i.e. not just becausecountwas zero), theThreadPoolwould have still worked correctly. Why is this true, and why is theiftest still the right thing to include?
Problem 2: Networking, Client/Server, Request/Response
- Explain the differences between a pipe and a socket.
- Describe how networking is just another form of function call and return. What “function” is being called? What are the parameters? And what’s the return value? Assume HTTP is the operative protocol.
- Describe the network architecture needed for:
- Email servers and clients
- Peer-to-Peer Text Messaging via cell phone numbers
- Skype
- As it turns out, each of the three network applications above all make use of custom protocols. Which ones could have relied on HTTP and/or HTTPS instead of custom protocols?
Problem 3: Web Servers and Python Scripts
It’s high time you implement your own multithreaded HTTP web server, making use
of a ThreadPool to manage the concurrency, and all of the multiprocessing
functions (fork and all related functions) to launch another executable – in
all cases, a Python script – to ingest the majority of the HTTP request from
the client and publish the entire HTTP response back to it.
Here are the pertinent details:
- The web server listens to port 80 like all normal web servers do.
- Every single HTTP request is really a request to invoke a python script. For
instance, if the first line of a request is
GET /add.py?one=11&two=50 HTTP/1.0, the web server executes the{"python", "./scripts/add.py", "GET", "/add.py?one=11&two=50", "HTTP/1.0"}argument vector. - All python scripts are housed within the
scriptsdirectory.GET /one.pyinvokes./scripts/one.py,GET /deep/down/below.pyinvokes./scripts/deep/down/below.py, and so forth. - All python scripts expect to read the entire HTTP request (save for the first line) in through standard in, and the entire HTTP response is published via standard out.
- All python scripts accept three arguments, which are the tokens that make up the first line of the HTTP request. The web server itself must read in the first line from the client, tokenize it so it knows what to pass as arguments to the relevant python script, and then assume the python script pulls everything else.
- If the URL identifies a resource whose name doesn’t end up
.py, then your server can simply ignore the request and close the connection. - Similarly, if the URL identifies a resource whose name ends in
.py, but the python script doesn’t exist, your web server can ignore the request and close the connection. - If the URL identifies a resource whose name ends in
.pybut includes.., your web server can ignore the request and close the connection. (This is presumably an attempt to reach up above thescriptsdirectory and gain access to an arbitrary executable.) - You may not use the
systemfunction. You must usefork,execvp, and other related functions. - You may assume that all python scripts successfully invoked always succeed,
so you needn’t check any status codes via
WIFEXITED,WEXITSTATUS, etc. - Your implementation must close all unused descriptors and cull all zombie processes.
- Your implementation should not make use of any signal handlers.
Your implementation can make use of the following routine, which reads the
first line of an HTTP request coming over the provided client socket (up to and
including the \r\n), and surfaces the method (e.g. GET), the full URL
(/add.py?one=11&two=50), the path (/add.py), and the protocol (HTTP/1.1)
through the four strings shared by reference.
static void parseRequest(int client, string& method, string& url, string& path, string& protocol);
You’re to complete the implementation of the web server by fleshing out the details of the handleRequest function.
int main() {
int server = createServerSocket(/* port = */ 80);
runServer(server);
return 0; // never gets here, but compiler doesn't know that
}
static const kNumThreads = 16;
static void runServer(int server) {
ThreadPool pool(kNumThreads);
while (true) {
int client = accept(server, NULL, NULL);
pool.schedule([client] { handleRequest(client); });
}
}
static void handleRequest(int client) {