Lab 7 Solutions
These questions were written by Jerry Cain, Nick Troccoli, Chris Gregg, and Ryan Eberhardt.
Before the end of lab, be sure to fill out the lab checkoff sheet here!
Problem 1: Networking short-answer questions
-
Explain the differences between a pipe and a socket.
- Fundamentally, a pipe is a unidirectional communication channel and a socket is a bidirectional one. Pipes also are only used to communicate between processes on the same computer descending from the same process tree, whereas sockets are used to communicate between any two processes on any computers anywhere. Finally, pipes must be created before a process is forked in order to facilitate interprocess communication, whereas sockets can be created at any time.
-
Describe how in some sense, HTTP requests/responses are just another form of function call and return. What “function” is being called? What are the parameters?
- The client requires some something to be done in another context, and in this case that context is provided on another machine as opposed to some other function on the same machine. The function being called is the URL (where the function lives, and which particular service is relevant, e.g. http://cs110.stanford.edu/cgi-bin/gradebook), the parameters are expressed via text passed from client to server, and the return value is expressed via text passed from server to client.
-
Consider the two server implementations below, where the sequential
handleRequest
function always takes exactly 1.500 seconds to execute. The two servers would respond very differently if 1000 clients were to connect – one per 1.000 seconds – over a 1000 second window. What would the 500th client experience when it tried to connect to the first server? What would the 500th client experience when it tried to connect to the second server? Which implementation do you think is better?// Server Implementation 1 int main(int argc, char *argv[]) { int server = createServerSocket(12345); // sets the backlog to 128 while (true) { int client = accept(server, NULL, NULL); handleRequest(client); } } // Server Implementation 2 int main(int argc, char *argv[]) { ThreadPool pool(1); int server = createServerSocket(12346); // sets the backlog to 128 while (true) { int client = accept(server, NULL, NULL); pool.schedule([client] { handleRequest(client); }); } }
Recall that the implementation of
createServerSocket
callslisten
, which sets up a waiting list with room for 128 clients. By the time the 500th client attempts to connect to the first server, that waiting list will be either full or nearly full, so there’s a very good chance that the 500th client will be dropped. (The client would see an error such as “The connection was reset.") The second server, however, immediately accepts all incoming connection requests and passes the buck on to the thread pool, where the client connection will wait its turn in the ThreadPool queue.Which one is better? Well, it depends. You might argue that the second implementation is better, because all clients get serviced eventually. Imagine trying to sign up for classes on Axxess when enrollment opens at midnight; instead of having to try enrolling many times and getting a lot of errors, it might be nice to submit one enroll request, and even though the request might take a long time to process, it might be nice to know that it will be processed eventually. But on the flipside, the second implementation has major problems when the server is temporarily flooded with clients. Imagine the server temporarily getting a flood of 10,000 requests all at once, and then traffic going back to normal levels shortly afterwards. The first server might have to turn away many of those clients during the traffic spike, but it will return to normal once traffic dies down. By contrast, the second server will spend 1.5s * 10,000 = 250 minutes working through all the requests from the spike, and it won’t be able to service any requests until it works through all of those clients. And what’s the point? By that time, those clients will have given up waiting for a response and will likely have timed out, so they won’t even be around anymore to receive the response. This is a very serious problem, as this traffic spike scenario is quite common (e.g. people trying to sign up on Axxess, people visiting a website because a link to the site went viral on social media, malicious clients directing massive traffic at a website attempting to take it down).
Most servers use a balance between these two approaches: the server uses a thread pool to process requests in parallel, but they use a bounded queue to ensure that they can quickly recover from high-load scenarios.
Problem 2: Implementing a basic web server
In this problem, we’ll work through the process of implementing a
fully-functional web server, just like the one running at web.stanford.edu
that is hosting this website!
Before starting, go ahead and clone the lab7
folder:
$ git clone /usr/class/cs110/repos/lab7/shared lab7
$ cd lab7
$ make
To run the server, pick a random port number between 1025 and 65,535 and run
./web-server
with the port number as the first argument:
./web-server 16382
The starter code parses the port number from argv
. You’ll need to work
through starting the server and handling incoming requests.
Part 1: Starting the server
Use createServerSocket
to bind to the given port
. If this function fails,
print an error and return kServerStartFailure
.
Here’s our solution:
int main(int argc, char *argv[]) {
// ...
int server = createServerSocket(port);
if (server == -1) {
cerr << "Failed to bind to port " << port << endl;
return kServerStartFailure;
}
cout << "Server listening on port " << port << "." << endl;
}
Possible questions for discussion:
- What could commonly cause
createServerSocket
to fail?- The most common cause is that someone (maybe another instance of your program) is already bound to the port you’re trying to use. Another common cause is that you don’t have permission to bind to the port, e.g. you need special user priviliges to bind to ports 1-1024.
createServerSocket
returns anint
. What is this returned number? What should we do with it?- This returned value is a file descriptor connected to the waiting list. We can use this file descriptor to find out when a client is connecting to the server, and to create a new file descriptor linked to that client.
Part 2: Handling connections
Wait for a client to connect. Every time a client connects, print a message (e.g. “Client connected”).
Connect to the myth machine you’re running on using the Stanford VPN, just as you did in Assignment 5, or use an SSH proxy:
ssh -L 16382:localhost:16382 [email protected]
# ^ replace 16382 and mythXX with your myth machine and chosen port number
Then, start web-server
, open your browser, and navigate to
http://mythXX.stanford.edu:portNum/
, replacing mythXX
with the myth machine
you’re running on, and replacing portNum
with your chosen port number. Your
browser may show an error or may not display anything, but you should see
“Client connected” show up in your terminal.
Here’s our code. We have some extra code to get the IP address of the connecting client, but this isn’t necessary.
int main(int argc, char *argv[]) {
if (argc > 2) {
cerr << "Usage: " << argv[0] << " [<port>]" << endl;
return kWrongArgumentCount;
}
unsigned short port = extractPort(argv[1]);
if (port == USHRT_MAX) {
cerr << "Invalid port number specified" << endl;
return kIllegalPortArgument;
}
int server = createServerSocket(port);
if (server == -1) {
cerr << "Failed to bind to port " << port << endl;
return kServerStartFailure;
}
cout << "Server listening on port " << port << "." << endl;
while (true) {
struct sockaddr_in address;
// used to surface ip address from the client
socklen_t size = sizeof(address);
bzero(&address, size);
int client = accept(server, (struct sockaddr *)&address, &size);
char str[INET_ADDRSTRLEN];
cout << "Received a connection request from "
<< inet_ntop(AF_INET, &address.sin_addr, str, INET_ADDRSTRLEN)
<< "." << endl;
}
return 0;
}
Possible questions for discussion:
- What happens if two clients attempt to connect to your server at the same
exact time?
- The server will see one of the clients via
accept
and begin having a conversation with it; the second client will wait on the waiting list until the server callsaccept
again.
- The server will see one of the clients via
Part 3: Reading the request
Using an iosockstream
, read the full request from the client (including
headers), and determine the path being requested by the client. It’s good
practice to read the full request (including headers) even though you only need
the request line, and failing to do so may cause problems in rare cases where
the client sends a large amount of header data.
You can read a single token (where tokens are separated by whitespace) using an istream
like so:
string token;
someIstream >> token;
To read a full line, you can use getline
:
string line;
getline(someIstream, line);
Print out the requested path. If you navigate to
http://mythXX.stanford.edu:portNum/samples/cs110/
, you should receive the
path /samples/cs110/
.
Here’s our code:
int main(int argc, char *argv[]) {
if (argc > 2) {
cerr << "Usage: " << argv[0] << " [<port>]" << endl;
return kWrongArgumentCount;
}
unsigned short port = extractPort(argv[1]);
if (port == USHRT_MAX) {
cerr << "Invalid port number specified" << endl;
return kIllegalPortArgument;
}
int server = createServerSocket(port);
if (server == -1) {
cerr << "Failed to bind to port " << port << endl;
return kServerStartFailure;
}
cout << "Server listening on port " << port << "." << endl;
while (true) {
struct sockaddr_in address;
// used to surface ip address from the client
socklen_t size = sizeof(address);
bzero(&address, size);
int client = accept(server, (struct sockaddr *)&address, &size);
char str[INET_ADDRSTRLEN];
cout << "Received a connection request from "
<< inet_ntop(AF_INET, &address.sin_addr, str, INET_ADDRSTRLEN)
<< "." << endl;
serveFile(client);
}
return 0;
}
static void serveFile(int client) {
sockbuf sb(client);
iosockstream ss(&sb);
string fileName = getFilename(ss);
skipHeaders(ss);
}
static string getFilename(iosockstream& ss) {
string method, path, protocol;
ss >> method >> path >> protocol;
string rest;
getline(ss, rest);
cout << "\tPath requested: " << path << endl;
return path;
}
static void skipHeaders(iosockstream& ss) {
string line;
do {
getline(ss, line);
} while (!line.empty() && line != "\r");
}
Part 4: Loading the requested file
Use the loadPath
function from the starter code to read the file requested by
the client. Note that the path specified by the client will be in the form
/samples/cs110
, but you want to treat this as a relative path, with no
leading /
(e.g. loadPath("samples/cs110")
). Here is some code that can
handle this for you:
// given some variable `path`...
if (path == "/") {
path = ".";
}
// strip off leading /
size_t slashPos = path.find("/");
path = slashPos == string::npos ? path : path.substr(slashPos + 1);
loadPath
returns a pair
including the file contents and a boolean
indicating whether the file contents are HTML. (HTML is a language that is used
to represent web pages; browsers know how to render HTML into a visual page
that users can interact with.) For now, you can just print the contents and the
boolean. If you navigate to
http://mythXX.stanford.edu:portNum/samples/subdir/file1
, your program should
print file1 contents
, and the HTML boolean should be false
.
Here’s our code:
int main(int argc, char *argv[]) {
if (argc > 2) {
cerr << "Usage: " << argv[0] << " [<port>]" << endl;
return kWrongArgumentCount;
}
unsigned short port = extractPort(argv[1]);
if (port == USHRT_MAX) {
cerr << "Invalid port number specified" << endl;
return kIllegalPortArgument;
}
int server = createServerSocket(port);
if (server == -1) {
cerr << "Failed to bind to port " << port << endl;
return kServerStartFailure;
}
cout << "Server listening on port " << port << "." << endl;
while (true) {
struct sockaddr_in address;
// used to surface ip address from the client
socklen_t size = sizeof(address);
bzero(&address, size);
int client = accept(server, (struct sockaddr *)&address, &size);
char str[INET_ADDRSTRLEN];
cout << "Received a connection request from "
<< inet_ntop(AF_INET, &address.sin_addr, str, INET_ADDRSTRLEN)
<< "." << endl;
serveFile(client);
}
return 0;
}
static void serveFile(int client) {
sockbuf sb(client);
iosockstream ss(&sb);
string fileName = getFilename(ss);
skipHeaders(ss);
pair<string, bool> contents = loadPath(fileName);
}
static string getFilename(iosockstream& ss) {
string method, path, protocol;
ss >> method >> path >> protocol;
string rest;
getline(ss, rest);
cout << "\tPath requested: " << path << endl;
if (path == "/") {
// serve current directory
return (".");
}
size_t pos = path.find("/");
return pos == string::npos ? path : path.substr(pos + 1);
}
static void skipHeaders(iosockstream& ss) {
string line;
do {
getline(ss, line);
} while (!line.empty() && line != "\r");
}
Questions for discussion:
- Can you think of any security vulnerabilities that might arise with the way
we are handling provided paths? Could an attacker specify a path in such a
way that they could read arbitrary files from our computer? For example, if
the course staff were running this program on myth, is there a way that you
could send a request to this server to read the sample solutions for the
assignments?
- A malicious client could pass in an absolute file path that starts with
two slashes, such as
//usr/class/cs110/
, or (worse),//afs/ir/users/y/h/yourHome
, and this server does not protect against any non-desired access. Once your server is accessible to the network, anyone on the network can access it.
- A malicious client could pass in an absolute file path that starts with
two slashes, such as
Part 5: Serving the file to the client
Now that you have the file contents, let’s send this back to the client as an
HTTP response. Your response should include a Content-Length
header
specifying the size of the file contents, as well as a Content-Type
header
that is text/html; charset=UTF-8
if the file is HTML or text/plain; charset=UTF-8
otherwise.
You should now have a working HTTP server! Navigate to
http://mythXX.stanford.edu:portNum/samples/cs110/
to see your server in
action.
If you encounter problems, try running curl -vv http://mythXX.stanford.edu:portNum/samples/cs110/
to see what is being sent to
your server and what is being received. Alternatively, try nc mythXX portNum
and manually type in an HTTP request, then see the response sent back by your
server.
Here’s our code:
int main(int argc, char *argv[]) {
if (argc > 2) {
cerr << "Usage: " << argv[0] << " [<port>]" << endl;
return kWrongArgumentCount;
}
unsigned short port = extractPort(argv[1]);
if (port == USHRT_MAX) {
cerr << "Invalid port number specified" << endl;
return kIllegalPortArgument;
}
int server = createServerSocket(port);
if (server == -1) {
cerr << "Failed to bind to port " << port << endl;
return kServerStartFailure;
}
cout << "Server listening on port " << port << "." << endl;
while (true) {
struct sockaddr_in address;
// used to surface ip address from the client
socklen_t size = sizeof(address);
bzero(&address, size);
int client = accept(server, (struct sockaddr *)&address, &size);
char str[INET_ADDRSTRLEN];
cout << "Received a connection request from "
<< inet_ntop(AF_INET, &address.sin_addr, str, INET_ADDRSTRLEN)
<< "." << endl;
serveFile(client);
}
return 0;
}
static void serveFile(int client) {
sockbuf sb(client);
iosockstream ss(&sb);
string fileName = getFilename(ss);
skipHeaders(ss);
pair<string, bool> contents = loadPath(fileName);
sendResponse(ss, contents.first, contents.second);
}
static void sendResponse(iosockstream& ss, const string& payload, bool isHTML) {
ss << "HTTP/1.1 200 OK\r\n";
if (isHTML) {
ss << "Content-Type: text/html; charset=UTF-8\r\n";
} else {
ss << "Content-Type: text/plain; charset=UTF-8\r\n";
}
ss << "Content-Length: " << payload.size() << "\r\n";
ss << "\r\n";
ss << payload << flush;
}
Questions for discussion:
-
A well-intentioned person might write some code that is decomposed like so:
int main() { ... int fd = accept(...); string path = getPath(fd); pair<string, bool> contents = loadPath(path); sendResponse(fd, contents); ... } string getPath(int fd) { sockbuf sb(fd); iosockstream ss(&sb); // use ss to read the path from the client... ... } void sendResponse(int fd, pair<string, bool> contents) { sockbuf sb(fd); iosockstream ss(&sb); // send `contents` to the client... ... }
Why doesn’t this decomposition work? How might you decompose instead?
- In
getPath
, thesockbuf
takes ownership of the file descriptor (connected to the client) and closes the file descriptor when it goes out of scope. Then, insendResponse
,fd
is already closed and can no longer be used to send anything to the client.
- In
-
It’s a good idea to
ss << flush
after writing the response to the iosockstream. Why is this important? Is it absolutely necessary in our case, or could we have gotten by without it?-
Similar to writing to
cout
, writing to an iosockstream does not immediately send any data to the client. Instead, the data is accumulated in thesockbuf
and periodically flushed to the client. This helps to improve performance, but it is good to force a flush when we’re done sending a response so that it is immediately sent to the client.This is not strictly necessary here, since the buffer will be flushed anyways once the
sockbuf
goes out of scope (i.e. at the end ofsendResponse
in our solution). However, this is still good practice, and it becomes much more important in more complicated situations (e.g. when there are multiple requests/responses in the same connection), and is crucial in Assignment 6.
-
Part 6: Adding threading
Let’s speed up your server! Add a ThreadPool to your code to serve files to up to 16 clients at a time.
Here’s our code:
int main(int argc, char *argv[]) {
if (argc > 2) {
cerr << "Usage: " << argv[0] << " [<port>]" << endl;
return kWrongArgumentCount;
}
unsigned short port = extractPort(argv[1]);
if (port == USHRT_MAX) {
cerr << "Invalid port number specified" << endl;
return kIllegalPortArgument;
}
int server = createServerSocket(port);
if (server == -1) {
cerr << "Failed to bind to port " << port << endl;
return kServerStartFailure;
}
cout << "Server listening on port " << port << "." << endl;
ThreadPool pool(16);
while (true) {
struct sockaddr_in address;
// used to surface ip address from the client
socklen_t size = sizeof(address);
bzero(&address, size);
int client = accept(server, (struct sockaddr *)&address, &size);
char str[INET_ADDRSTRLEN];
cout << "Received a connection request from "
<< inet_ntop(AF_INET, &address.sin_addr, str, INET_ADDRSTRLEN)
<< "." << endl;
pool.schedule([client] {
serveFile(client);
});
}
return 0;
}
Questions for discussion:
- What synchronization primitives did you need to add to ensure thread safety
for your code?
- None! The thread must capture the client file descriptor by value, but once it does that, it isn’t really sharing any data structures with other threads, and each conversation with each client is independent of any other conversations, so there is nothing to synchronize here.