Lab 7: Implementing a web server
These questions were written by Jerry Cain, Nick Troccoli, Chris Gregg, and Ryan Eberhardt.
Before the end of lab, be sure to fill out the lab checkoff sheet here!
Problem 1: Networking short-answer questions
-
Explain the differences between a pipe and a socket.
-
Describe how in some sense, HTTP requests/responses are just another form of function call and return. What “function” is being called? What are the parameters?
-
Consider the two server implementations below, where the sequential
handleRequest
function always takes exactly 1.500 seconds to execute. The two servers would respond very differently if 1000 clients were to connect – one per 1.000 seconds – over a 1000 second window. What would the 500th client experience when it tried to connect to the first server? What would the 500th client experience when it tried to connect to the second server? Which implementation do you think is better?// Server Implementation 1 int main(int argc, char *argv[]) { int server = createServerSocket(12345); // sets the backlog to 128 while (true) { int client = accept(server, NULL, NULL); handleRequest(client); } } // Server Implementation 2 int main(int argc, char *argv[]) { ThreadPool pool(1); int server = createServerSocket(12346); // sets the backlog to 128 while (true) { int client = accept(server, NULL, NULL); pool.schedule([client] { handleRequest(client); }); } }
Problem 2: Implementing a basic web server
In this problem, we’ll work through the process of implementing a
fully-functional web server, just like the one running at web.stanford.edu
that is hosting this website!
Before starting, go ahead and clone the lab7
folder:
$ git clone /usr/class/cs110/repos/lab7/shared lab7
$ cd lab7
$ make
To run the server, pick a random port number between 1025 and 65,535 and run
./web-server
with the port number as the first argument:
./web-server 16382
The starter code parses the port number from argv
. You’ll need to work
through starting the server and handling incoming requests.
Part 1: Starting the server
Use createServerSocket
to bind to the given port
. If this function fails,
print an error and return kServerStartFailure
.
Possible questions for discussion:
- What could commonly cause
createServerSocket
to fail? createServerSocket
returns anint
. What is this returned number? What should we do with it?
Part 2: Handling connections
Wait for a client to connect. Every time a client connects, print a message (e.g. “Client connected”).
Connect to the myth machine you’re running on using the Stanford VPN, just as you did in Assignment 5, or use an SSH proxy:
ssh -L 16382:localhost:16382 [email protected]
# ^ replace 16382 and mythXX with your myth machine and chosen port number
Then, start web-server
, open your browser, and navigate to
http://mythXX.stanford.edu:portNum/
, replacing mythXX
with the myth machine
you’re running on, and replacing portNum
with your chosen port number. Your
browser may show an error or may not display anything, but you should see
“Client connected” show up in your terminal.
Possible questions for discussion:
- What happens if two clients attempt to connect to your server at the same exact time?
Part 3: Reading the request
Using an iosockstream
, read the full request from the client (including
headers), and determine the path being requested by the client. It’s good
practice to read the full request (including headers) even though you only need
the request line, and failing to do so may cause problems in rare cases where
the client sends a large amount of header data.
You can read a single token (where tokens are separated by whitespace) using an istream
like so:
string token;
someIstream >> token;
To read a full line, you can use getline
:
string line;
getline(someIstream, line);
Print out the requested path. If you navigate to
http://mythXX.stanford.edu:portNum/samples/cs110/
, you should receive the
path /samples/cs110/
.
Part 4: Loading the requested file
Use the loadPath
function from the starter code to read the file requested by
the client. Note that the path specified by the client will be in the form
/samples/cs110
, but you want to treat this as a relative path, with no
leading /
(e.g. loadPath("samples/cs110")
). Here is some code that can
handle this for you:
// given some variable `path`...
if (path == "/") {
path = ".";
}
// strip off leading /
size_t slashPos = path.find("/");
path = slashPos == string::npos ? path : path.substr(slashPos + 1);
loadPath
returns a pair
including the file contents and a boolean
indicating whether the file contents are HTML. (HTML is a language that is used
to represent web pages; browsers know how to render HTML into a visual page
that users can interact with.) For now, you can just print the contents and the
boolean. If you navigate to
http://mythXX.stanford.edu:portNum/samples/subdir/file1
, your program should
print file1 contents
, and the HTML boolean should be false
.
Questions for discussion:
- Can you think of any security vulnerabilities that might arise with the way we are handling provided paths? Could an attacker specify a path in such a way that they could read arbitrary files from our computer? For example, if the course staff were running this program on myth, is there a way that you could send a request to this server to read the sample solutions for the assignments?
Part 5: Serving the file to the client
Now that you have the file contents, let’s send this back to the client as an
HTTP response. Your response should include a Content-Length
header
specifying the size of the file contents, as well as a Content-Type
header
that is text/html; charset=UTF-8
if the file is HTML or text/plain; charset=UTF-8
otherwise.
You should now have a working HTTP server! Navigate to
http://mythXX.stanford.edu:portNum/samples/cs110/
to see your server in
action.
If you encounter problems, try running curl -vv http://mythXX.stanford.edu:portNum/samples/cs110/
to see what is being sent to
your server and what is being received. Alternatively, try nc mythXX portNum
and manually type in an HTTP request, then see the response sent back by your
server.
Questions for discussion:
-
A well-intentioned person might write some code that is decomposed like so:
int main() { ... int fd = accept(...); string path = getPath(fd); pair<string, bool> contents = loadPath(path); sendResponse(fd, contents); ... } string getPath(int fd) { sockbuf sb(fd); iosockstream ss(&sb); // use ss to read the path from the client... ... } void sendResponse(int fd, pair<string, bool> contents) { sockbuf sb(fd); iosockstream ss(&sb); // send `contents` to the client... ... }
Why doesn’t this decomposition work? How might you decompose instead?
-
It’s a good idea to
ss << flush
after writing the response to the iosockstream. Why is this important? Is it absolutely necessary in our case, or could we have gotten by without it?
Part 6: Adding threading
Let’s speed up your server! Add a ThreadPool to your code to serve files to up to 16 clients at a time.
Questions for discussion:
- What synchronization primitives did you need to add to ensure thread safety for your code?