Lab 7: Implementing a web server

These questions were written by Jerry Cain, Nick Troccoli, Chris Gregg, and Ryan Eberhardt.

Before the end of lab, be sure to fill out the lab checkoff sheet here!

Problem 1: Networking short-answer questions

Explain the differences between a pipe and a socket.
Describe how in some sense, HTTP requests/responses are just another form of function call and return. What “function” is being called? What are the parameters?

Consider the two server implementations below, where the sequential handleRequest function always takes exactly 1.500 seconds to execute. The two servers would respond very differently if 1000 clients were to connect – one per 1.000 seconds – over a 1000 second window. What would the 500th client experience when it tried to connect to the first server? What would the 500th client experience when it tried to connect to the second server? Which implementation do you think is better?

// Server Implementation 1
int main(int argc, char *argv[]) {
   int server = createServerSocket(12345); // sets the backlog to 128
   while (true) {
      int client = accept(server, NULL, NULL);
      handleRequest(client);
   }
}
  
// Server Implementation 2
int main(int argc, char *argv[]) {
   ThreadPool pool(1);
   int server = createServerSocket(12346); // sets the backlog to 128
   while (true) {
      int client = accept(server, NULL, NULL);
      pool.schedule([client] { handleRequest(client); });
   }
}

Problem 2: Implementing a basic web server

In this problem, we’ll work through the process of implementing a fully-functional web server, just like the one running at web.stanford.edu that is hosting this website!

Before starting, go ahead and clone the lab7 folder:

$ git clone /usr/class/cs110/repos/lab7/shared lab7
$ cd lab7
$ make

To run the server, pick a random port number between 1025 and 65,535 and run ./web-server with the port number as the first argument:

./web-server 16382

The starter code parses the port number from argv. You’ll need to work through starting the server and handling incoming requests.

Part 1: Starting the server

Use createServerSocket to bind to the given port. If this function fails, print an error and return kServerStartFailure.

Possible questions for discussion:

What could commonly cause createServerSocket to fail?
createServerSocket returns an int. What is this returned number? What should we do with it?

Part 2: Handling connections

Wait for a client to connect. Every time a client connects, print a message (e.g. “Client connected”).

Connect to the myth machine you’re running on using the Stanford VPN, just as you did in Assignment 5, or use an SSH proxy:

ssh -L 16382:localhost:16382 [email protected]
# ^ replace 16382 and mythXX with your myth machine and chosen port number

Then, start web-server, open your browser, and navigate to http://mythXX.stanford.edu:portNum/, replacing mythXX with the myth machine you’re running on, and replacing portNum with your chosen port number. Your browser may show an error or may not display anything, but you should see “Client connected” show up in your terminal.

Possible questions for discussion:

What happens if two clients attempt to connect to your server at the same exact time?

Part 3: Reading the request

Using an iosockstream, read the full request from the client (including headers), and determine the path being requested by the client. It’s good practice to read the full request (including headers) even though you only need the request line, and failing to do so may cause problems in rare cases where the client sends a large amount of header data.

You can read a single token (where tokens are separated by whitespace) using an istream like so:

string token;
someIstream >> token;

To read a full line, you can use getline:

string line;
getline(someIstream, line);

Print out the requested path. If you navigate to http://mythXX.stanford.edu:portNum/samples/cs110/, you should receive the path /samples/cs110/.

Part 4: Loading the requested file

Use the loadPath function from the starter code to read the file requested by the client. Note that the path specified by the client will be in the form /samples/cs110, but you want to treat this as a relative path, with no leading / (e.g. loadPath("samples/cs110")). Here is some code that can handle this for you:

// given some variable `path`...
if (path == "/") {
    path = ".";
}
// strip off leading /
size_t slashPos = path.find("/");
path = slashPos == string::npos ? path : path.substr(slashPos + 1);

loadPath returns a pair including the file contents and a boolean indicating whether the file contents are HTML. (HTML is a language that is used to represent web pages; browsers know how to render HTML into a visual page that users can interact with.) For now, you can just print the contents and the boolean. If you navigate to http://mythXX.stanford.edu:portNum/samples/subdir/file1, your program should print file1 contents, and the HTML boolean should be false.

Questions for discussion:

Can you think of any security vulnerabilities that might arise with the way we are handling provided paths? Could an attacker specify a path in such a way that they could read arbitrary files from our computer? For example, if the course staff were running this program on myth, is there a way that you could send a request to this server to read the sample solutions for the assignments?

Part 5: Serving the file to the client

Now that you have the file contents, let’s send this back to the client as an HTTP response. Your response should include a Content-Length header specifying the size of the file contents, as well as a Content-Type header that is text/html; charset=UTF-8 if the file is HTML or text/plain; charset=UTF-8 otherwise.

You should now have a working HTTP server! Navigate to http://mythXX.stanford.edu:portNum/samples/cs110/ to see your server in action.

If you encounter problems, try running curl -vv http://mythXX.stanford.edu:portNum/samples/cs110/ to see what is being sent to your server and what is being received. Alternatively, try nc mythXX portNum and manually type in an HTTP request, then see the response sent back by your server.

Questions for discussion:

A well-intentioned person might write some code that is decomposed like so:

int main() {
    ...
    int fd = accept(...);
    string path = getPath(fd);
    pair<string, bool> contents = loadPath(path);
    sendResponse(fd, contents);
    ...
}
  
string getPath(int fd) {
    sockbuf sb(fd);
    iosockstream ss(&sb);
    // use ss to read the path from the client...
    ...
}
  
void sendResponse(int fd, pair<string, bool> contents) {
    sockbuf sb(fd);
    iosockstream ss(&sb);
    // send `contents` to the client...
    ...
}

Why doesn’t this decomposition work? How might you decompose instead?

It’s a good idea to ss << flush after writing the response to the iosockstream. Why is this important? Is it absolutely necessary in our case, or could we have gotten by without it?

Part 6: Adding threading

Let’s speed up your server! Add a ThreadPool to your code to serve files to up to 16 clients at a time.

Questions for discussion:

What synchronization primitives did you need to add to ensure thread safety for your code?

CS 110

Lab 7: Implementing a web server

Problem 1: Networking short-answer questions

Problem 2: Implementing a basic web server

Part 1: Starting the server

Part 2: Handling connections

Part 3: Reading the request

Part 4: Loading the requested file

Part 5: Serving the file to the client

Part 6: Adding threading