Lab 6 Solutions
This handout was adapted from Jerry Cain’s Spring 2018 offering.
Problem 1: ThreadPool
Thought Questions
Presented below are partial implementations of my own ThreadPool::wait
and my
ThreadPool::worker
method.
void ThreadPool::wait() {
lock_guard<mutex> lg(lock);
done.wait(lock, [this]{ return count == 0; });
}
void ThreadPool::worker(size_t workerID) {
while (true) {
// code here waits for a thunk to be supplied and then executes it
lock_guard<mutex> lg(lock);
if (--count == 0) done.notify_all();
}
}
- Briefly describe a simple
ThreadPool
test program that would have deadlocked hadThreadPool::worker
calleddone.notify_one
instead ofdone.notify_all
.- Allocate
ThreadPool
of size 1 on main thread, schedule[]{ sleep(10); }
on main thread, create two standalone threads that callThreadPool::wait()
and calljoin
on both. Standalone threads descend intodone.wait()
, only one notified, other sleeps and neverjoin
s.
- Allocate
- Assume
ThreadPool::worker
gets through the call todone.notify_all
and gets swapped off the processor immediately after thelock_guard
is destroyed. Briefly describe a situation where a thread that calledThreadPool::wait
still won’t advance past thedone.wait
call.ThreadPool::schedule
is called before thread indone.wait()
gets processor, acquiresmutex
, and reevaluates condition.ThreadPool::schedule
incrementscount
, so pathway for condition to fail exists.
- Had
done.notify_all
been called unconditionally (i.e. not just becausecount
was zero), theThreadPool
would have still worked correctly. Why is this true, and why is theif
test still the right thing to include?- Just because a thread wakes prematurely doesn’t mean it’ll rise from
done.wait()
. It needs to meet the supplied condition, and very often it won’t. Theif
test identifies situations where the chances the condition will be met are very good, thereby making better use of the CPU.
- Just because a thread wakes prematurely doesn’t mean it’ll rise from
Problem 2: Networking, Client/Server, Request/Response
- Explain the differences between a pipe and a socket.
- Fundamentally, a pipe is a unidirectional communication channel and a socket is a bidirectional one. Pipes also are only used to communicate within a given system while sockets are used to communicate over IP, almost always between different hosts. Finally, pipes must be created before a process is forked in order to facilitate interprocess communication, whereas sockets can be created at any time.
- Describe how networking is just another form of function call and return.
What “function” is being called? What are the parameters? And what’s the
return value? Assume HTTP is the operative protocol.
- The client requires some computation to be performed in another context, and in this case that context is provided on another machine as opposed to some other function on the same machine. The function being called is the URL (where the function lives, and which particular service is relevant, e.g. http://cs110.stanford.edu/cgi-bin/gradebook), the parameters are expressed via text passed from client to server, and the return value is expressed via text passed from server to client.
- Describe the network architecture needed for:
- Email servers and clients
- Most email clients and servers speak IMAP over a secure connection.
IMAP is similar to HTTP, except that the request and response
protocol is optimized for the selection of a mailbox, a digest of all
emails in that mailbox, the ability to create and delete mailboxes,
the ability to mark an email as read, and the ability, of course, to
send an email. (Curious how you can securely telnet to, say,
imap.gmail.com
? Read this.)
- Most email clients and servers speak IMAP over a secure connection.
IMAP is similar to HTTP, except that the request and response
protocol is optimized for the selection of a mailbox, a digest of all
emails in that mailbox, the ability to create and delete mailboxes,
the ability to mark an email as read, and the ability, of course, to
send an email. (Curious how you can securely telnet to, say,
- Peer-to-Peer Text Messaging via cell phone numbers
- By default, the cell service provider intercepts all messages via a centralized farm of servers and forwards messages (with images, emoji, etc) on to the intended recipient. In some cases (e.g. two iPhones in conversation over wifi), Apple mediates instead of, say, Verizon. For an accessible introduction to the actual protocol used by early SMS implementations, read this.
- Skype
- Same principle as SMS/text messaging, except that persistent connections between clients need to be maintained. This Wikipedia segment does a nice job explaining what Skype does, without going in to the weeds. If you like going into weeds, then this is a really, really well written technical piece explaining how it all works. If you take CS144, this last article is a reading assignment.
- Email servers and clients
- As it turns out, each of the three network applications above all make use of
custom protocols. Which ones could have relied on HTTP and/or HTTPS instead
of custom protocols?
- Even though it might have been cumbersome, all of them could have. HTTP/HTTPS is a fairly generic grammar that allows side effects, and everything needed for email, SMS, and video chat could, in principle, be codified via HTTP. However, custom protocols are generally constructed to optimize for common operations (as with email messages that need to be deleted) and/or the need for persistent connections (as with video conferencing).
Problem 3: Web Servers and Python Scripts
It’s high time you implement your own multithreaded HTTP web server, making use
of a ThreadPool
to manage the concurrency, and all of the multiprocessing
functions (fork
and all related functions) to launch another executable – in
all cases, a Python script – to ingest the majority of the HTTP request from
the client and publish the entire HTTP response back to it.
Here are the pertinent details:
- The web server listens to port 80 like all normal web servers do.
- Every single HTTP request is really a request to invoke a python script. For
instance, if the first line of a request is
GET /add.py?one=11&two=50 HTTP/1.0
, the web server executes the{"python", "./scripts/add.py", "GET", "/add.py?one=11&two=50", "HTTP/1.0"}
argument vector. - All python scripts are housed within the
scripts
directory.GET /one.py
invokes./scripts/one.py
,GET /deep/down/below.py
invokes./scripts/deep/down/below.py
, and so forth. - All python scripts expect to read the entire HTTP request (save for the first line) in through standard in, and the entire HTTP response is published via standard out.
- All python scripts accept three arguments, which are the tokens that make up the first line of the HTTP request. The web server itself must read in the first line from the client, tokenize it so it knows what to pass as arguments to the relevant python script, and then assume the python script pulls everything else.
- If the URL identifies a resource whose name doesn’t end up
.py
, then your server can simply ignore the request and close the connection. - Similarly, if the URL identifies a resource whose name ends in
.py
, but the python script doesn’t exist, your web server can ignore the request and close the connection. - If the URL identifies a resource whose name ends in
.py
but includes..
, your web server can ignore the request and close the connection. (This is presumably an attempt to reach up above thescripts
directory and gain access to an arbitrary executable.) - You may not use the
system
function. You must usefork
,execvp
, and other related functions. - You may assume that all python scripts successfully invoked always succeed,
so you needn’t check any status codes via
WIFEXITED
,WEXITSTATUS
, etc. - Your implementation must close all unused descriptors and cull all zombie processes.
- Your implementation should not make use of any signal handlers.
Your implementation can make use of the following routine, which reads the
first line of an HTTP request coming over the provided client socket (up to and
including the \r\n
), and surfaces the method (e.g. GET
), the full URL
(/add.py?one=11&two=50
), the path (/add.py
), and the protocol (HTTP/1.1
)
through the four string
s shared by reference.
static void parseRequest(int client, string& method, string& url, string& path, string& protocol);
You’re to complete the implementation of the web server by fleshing out the details of the handleRequest
function.
int main() {
int server = createServerSocket(/* port = */ 80);
runServer(server);
return 0; // never gets here, but compiler doesn't know that
}
static const kNumThreads = 16;
static void runServer(int server) {
ThreadPool pool(kNumThreads);
while (true) {
int client = accept(server, NULL, NULL);
pool.schedule([client] { handleRequest(client); });
}
}
static void handleRequest(int client) {
// everything below represents a reasonable solution that meets all requirements
string method, url, path, protocol;
parseRequest(client, method, url, path, protocol);
if (!endsWith(path, ".py") || path.find("..") != string::npos) {
close(client);
return;
}
pid_t pid = fork();
if (pid == 0) {
dup2(client, STDIN_FILENO);
dup2(client, STDOUT_FILENO);
close(client);
string script = "./scripts" + path;
const char *argv[] = {
"python", script.c_str(), method.c_str(), url.c_str(), protocol.c_str(), NULL
};
execvp(argv[0], const_cast<char **>(argv));
exit(0);
}
close(client);
waitpid(pid, NULL, 0);
}