Lab 6: ThreadPool and Networking
This handout was adapted from Jerry Cain’s Spring 2018 offering.
Problem 1: ThreadPool
Thought Questions
Presented below are partial implementations of my own ThreadPool::wait
and my
ThreadPool::worker
method.
void ThreadPool::wait() {
lock_guard<mutex> lg(lock);
done.wait(lock, [this]{ return count == 0; });
}
void ThreadPool::worker(size_t workerID) {
while (true) {
// code here waits for a thunk to be supplied and then executes it
lock_guard<mutex> lg(lock);
if (--count == 0) done.notify_all();
}
}
- Briefly describe a simple
ThreadPool
test program that would have deadlocked hadThreadPool::worker
calleddone.notify_one
instead ofdone.notify_all
. - Assume
ThreadPool::worker
gets through the call todone.notify_all
and gets swapped off the processor immediately after thelock_guard
is destroyed. Briefly describe a situation where a thread that calledThreadPool::wait
still won’t advance past thedone.wait
call. - Had
done.notify_all
been called unconditionally (i.e. not just becausecount
was zero), theThreadPool
would have still worked correctly. Why is this true, and why is theif
test still the right thing to include?
Problem 2: Networking, Client/Server, Request/Response
- Explain the differences between a pipe and a socket.
- Describe how networking is just another form of function call and return. What “function” is being called? What are the parameters? And what’s the return value? Assume HTTP is the operative protocol.
- Describe the network architecture needed for:
- Email servers and clients
- Peer-to-Peer Text Messaging via cell phone numbers
- Skype
- As it turns out, each of the three network applications above all make use of custom protocols. Which ones could have relied on HTTP and/or HTTPS instead of custom protocols?
Problem 3: Web Servers and Python Scripts
It’s high time you implement your own multithreaded HTTP web server, making use
of a ThreadPool
to manage the concurrency, and all of the multiprocessing
functions (fork
and all related functions) to launch another executable – in
all cases, a Python script – to ingest the majority of the HTTP request from
the client and publish the entire HTTP response back to it.
Here are the pertinent details:
- The web server listens to port 80 like all normal web servers do.
- Every single HTTP request is really a request to invoke a python script. For
instance, if the first line of a request is
GET /add.py?one=11&two=50 HTTP/1.0
, the web server executes the{"python", "./scripts/add.py", "GET", "/add.py?one=11&two=50", "HTTP/1.0"}
argument vector. - All python scripts are housed within the
scripts
directory.GET /one.py
invokes./scripts/one.py
,GET /deep/down/below.py
invokes./scripts/deep/down/below.py
, and so forth. - All python scripts expect to read the entire HTTP request (save for the first line) in through standard in, and the entire HTTP response is published via standard out.
- All python scripts accept three arguments, which are the tokens that make up the first line of the HTTP request. The web server itself must read in the first line from the client, tokenize it so it knows what to pass as arguments to the relevant python script, and then assume the python script pulls everything else.
- If the URL identifies a resource whose name doesn’t end up
.py
, then your server can simply ignore the request and close the connection. - Similarly, if the URL identifies a resource whose name ends in
.py
, but the python script doesn’t exist, your web server can ignore the request and close the connection. - If the URL identifies a resource whose name ends in
.py
but includes..
, your web server can ignore the request and close the connection. (This is presumably an attempt to reach up above thescripts
directory and gain access to an arbitrary executable.) - You may not use the
system
function. You must usefork
,execvp
, and other related functions. - You may assume that all python scripts successfully invoked always succeed,
so you needn’t check any status codes via
WIFEXITED
,WEXITSTATUS
, etc. - Your implementation must close all unused descriptors and cull all zombie processes.
- Your implementation should not make use of any signal handlers.
Your implementation can make use of the following routine, which reads the
first line of an HTTP request coming over the provided client socket (up to and
including the \r\n
), and surfaces the method (e.g. GET
), the full URL
(/add.py?one=11&two=50
), the path (/add.py
), and the protocol (HTTP/1.1
)
through the four string
s shared by reference.
static void parseRequest(int client, string& method, string& url, string& path, string& protocol);
You’re to complete the implementation of the web server by fleshing out the details of the handleRequest
function.
int main() {
int server = createServerSocket(/* port = */ 80);
runServer(server);
return 0; // never gets here, but compiler doesn't know that
}
static const kNumThreads = 16;
static void runServer(int server) {
ThreadPool pool(kNumThreads);
while (true) {
int client = accept(server, NULL, NULL);
pool.schedule([client] { handleRequest(client); });
}
}
static void handleRequest(int client) {