Lab 7 Solutions

These questions were written by Jerry Cain, Nick Troccoli, Chris Gregg, and Ryan Eberhardt.

Before the end of lab, be sure to fill out the lab checkoff sheet here!

Problem 1: Networking short-answer questions

Explain the differences between a pipe and a socket.
- Fundamentally, a pipe is a unidirectional communication channel and a socket is a bidirectional one. Pipes also are only used to communicate between processes on the same computer descending from the same process tree, whereas sockets are used to communicate between any two processes on any computers anywhere. Finally, pipes must be created before a process is forked in order to facilitate interprocess communication, whereas sockets can be created at any time.
Describe how in some sense, HTTP requests/responses are just another form of function call and return. What “function” is being called? What are the parameters?
- The client requires some something to be done in another context, and in this case that context is provided on another machine as opposed to some other function on the same machine. The function being called is the URL (where the function lives, and which particular service is relevant, e.g. http://cs110.stanford.edu/cgi-bin/gradebook), the parameters are expressed via text passed from client to server, and the return value is expressed via text passed from server to client.
Consider the two server implementations below, where the sequential handleRequest function always takes exactly 1.500 seconds to execute. The two servers would respond very differently if 1000 clients were to connect – one per 1.000 seconds – over a 1000 second window. What would the 500th client experience when it tried to connect to the first server? What would the 500th client experience when it tried to connect to the second server? Which implementation do you think is better?
```
// Server Implementation 1
int main(int argc, char *argv[]) {
   int server = createServerSocket(12345); // sets the backlog to 128
   while (true) {
      int client = accept(server, NULL, NULL);
      handleRequest(client);
   }
}
  
// Server Implementation 2
int main(int argc, char *argv[]) {
   ThreadPool pool(1);
   int server = createServerSocket(12346); // sets the backlog to 128
   while (true) {
      int client = accept(server, NULL, NULL);
      pool.schedule([client] { handleRequest(client); });
   }
}
```
Recall that the implementation of createServerSocket calls listen, which sets up a waiting list with room for 128 clients. By the time the 500th client attempts to connect to the first server, that waiting list will be either full or nearly full, so there’s a very good chance that the 500th client will be dropped. (The client would see an error such as “The connection was reset.") The second server, however, immediately accepts all incoming connection requests and passes the buck on to the thread pool, where the client connection will wait its turn in the ThreadPool queue.

Which one is better? Well, it depends. You might argue that the second implementation is better, because all clients get serviced eventually. Imagine trying to sign up for classes on Axxess when enrollment opens at midnight; instead of having to try enrolling many times and getting a lot of errors, it might be nice to submit one enroll request, and even though the request might take a long time to process, it might be nice to know that it will be processed eventually. But on the flipside, the second implementation has major problems when the server is temporarily flooded with clients. Imagine the server temporarily getting a flood of 10,000 requests all at once, and then traffic going back to normal levels shortly afterwards. The first server might have to turn away many of those clients during the traffic spike, but it will return to normal once traffic dies down. By contrast, the second server will spend 1.5s * 10,000 = 250 minutes working through all the requests from the spike, and it won’t be able to service any requests until it works through all of those clients. And what’s the point? By that time, those clients will have given up waiting for a response and will likely have timed out, so they won’t even be around anymore to receive the response. This is a very serious problem, as this traffic spike scenario is quite common (e.g. people trying to sign up on Axxess, people visiting a website because a link to the site went viral on social media, malicious clients directing massive traffic at a website attempting to take it down).

Most servers use a balance between these two approaches: the server uses a thread pool to process requests in parallel, but they use a bounded queue to ensure that they can quickly recover from high-load scenarios.

Problem 2: Implementing a basic web server

In this problem, we’ll work through the process of implementing a fully-functional web server, just like the one running at web.stanford.edu that is hosting this website!

Before starting, go ahead and clone the lab7 folder:

$ git clone /usr/class/cs110/repos/lab7/shared lab7
$ cd lab7
$ make

To run the server, pick a random port number between 1025 and 65,535 and run ./web-server with the port number as the first argument:

./web-server 16382

The starter code parses the port number from argv. You’ll need to work through starting the server and handling incoming requests.

Part 1: Starting the server

Use createServerSocket to bind to the given port. If this function fails, print an error and return kServerStartFailure.

Here’s our solution:

int main(int argc, char *argv[]) {
    // ...
    int server = createServerSocket(port);
    if (server == -1) {
        cerr << "Failed to bind to port " << port << endl;
        return kServerStartFailure;
    }
    cout << "Server listening on port " << port << "." << endl;
}

Possible questions for discussion:

What could commonly cause createServerSocket to fail?
- The most common cause is that someone (maybe another instance of your program) is already bound to the port you’re trying to use. Another common cause is that you don’t have permission to bind to the port, e.g. you need special user priviliges to bind to ports 1-1024.
createServerSocket returns an int. What is this returned number? What should we do with it?
- This returned value is a file descriptor connected to the waiting list. We can use this file descriptor to find out when a client is connecting to the server, and to create a new file descriptor linked to that client.

Part 2: Handling connections

Wait for a client to connect. Every time a client connects, print a message (e.g. “Client connected”).

Connect to the myth machine you’re running on using the Stanford VPN, just as you did in Assignment 5, or use an SSH proxy:

ssh -L 16382:localhost:16382 [email protected]
# ^ replace 16382 and mythXX with your myth machine and chosen port number

Then, start web-server, open your browser, and navigate to http://mythXX.stanford.edu:portNum/, replacing mythXX with the myth machine you’re running on, and replacing portNum with your chosen port number. Your browser may show an error or may not display anything, but you should see “Client connected” show up in your terminal.

Here’s our code. We have some extra code to get the IP address of the connecting client, but this isn’t necessary.

int main(int argc, char *argv[]) {
    if (argc > 2) {
        cerr << "Usage: " << argv[0] << " [<port>]" << endl;
        return kWrongArgumentCount;
    }
    unsigned short port = extractPort(argv[1]);
    if (port == USHRT_MAX) {
        cerr << "Invalid port number specified" << endl;
        return kIllegalPortArgument;
    }
    int server = createServerSocket(port);
    if (server == -1) {
        cerr << "Failed to bind to port " << port << endl;
        return kServerStartFailure;
    }
    cout << "Server listening on port " << port << "." << endl;

    while (true) {
        struct sockaddr_in address;
        // used to surface ip address from the client
        socklen_t size = sizeof(address);
        bzero(&address, size);
        int client = accept(server, (struct sockaddr *)&address, &size);
        char str[INET_ADDRSTRLEN];
        cout << "Received a connection request from "
            << inet_ntop(AF_INET, &address.sin_addr, str, INET_ADDRSTRLEN)
            << "." << endl;
    }
    return 0;
}

Possible questions for discussion:

What happens if two clients attempt to connect to your server at the same exact time?
- The server will see one of the clients via accept and begin having a conversation with it; the second client will wait on the waiting list until the server calls accept again.

Part 3: Reading the request

Using an iosockstream, read the full request from the client (including headers), and determine the path being requested by the client. It’s good practice to read the full request (including headers) even though you only need the request line, and failing to do so may cause problems in rare cases where the client sends a large amount of header data.

You can read a single token (where tokens are separated by whitespace) using an istream like so:

string token;
someIstream >> token;

To read a full line, you can use getline:

string line;
getline(someIstream, line);

Print out the requested path. If you navigate to http://mythXX.stanford.edu:portNum/samples/cs110/, you should receive the path /samples/cs110/.

Here’s our code:

int main(int argc, char *argv[]) {
    if (argc > 2) {
        cerr << "Usage: " << argv[0] << " [<port>]" << endl;
        return kWrongArgumentCount;
    }
    unsigned short port = extractPort(argv[1]);
    if (port == USHRT_MAX) {
        cerr << "Invalid port number specified" << endl;
        return kIllegalPortArgument;
    }
    int server = createServerSocket(port);
    if (server == -1) {
        cerr << "Failed to bind to port " << port << endl;
        return kServerStartFailure;
    }
    cout << "Server listening on port " << port << "." << endl;

    while (true) {
        struct sockaddr_in address;
        // used to surface ip address from the client
        socklen_t size = sizeof(address);
        bzero(&address, size);
        int client = accept(server, (struct sockaddr *)&address, &size);
        char str[INET_ADDRSTRLEN];
        cout << "Received a connection request from "
            << inet_ntop(AF_INET, &address.sin_addr, str, INET_ADDRSTRLEN)
            << "." << endl;
        serveFile(client);
    }
    return 0;
}

static void serveFile(int client) {
    sockbuf sb(client);
    iosockstream ss(&sb);

    string fileName = getFilename(ss);
    skipHeaders(ss);
}

static string getFilename(iosockstream& ss) {
    string method, path, protocol;
    ss >> method >> path >> protocol;
    string rest;
    getline(ss, rest);
    cout << "\tPath requested: " << path << endl;
    return path;
}

static void skipHeaders(iosockstream& ss) {
  string line;
  do {
    getline(ss, line);
  } while (!line.empty() && line != "\r");
}

Part 4: Loading the requested file

Use the loadPath function from the starter code to read the file requested by the client. Note that the path specified by the client will be in the form /samples/cs110, but you want to treat this as a relative path, with no leading / (e.g. loadPath("samples/cs110")). Here is some code that can handle this for you:

// given some variable `path`...
if (path == "/") {
    path = ".";
}
// strip off leading /
size_t slashPos = path.find("/");
path = slashPos == string::npos ? path : path.substr(slashPos + 1);

loadPath returns a pair including the file contents and a boolean indicating whether the file contents are HTML. (HTML is a language that is used to represent web pages; browsers know how to render HTML into a visual page that users can interact with.) For now, you can just print the contents and the boolean. If you navigate to http://mythXX.stanford.edu:portNum/samples/subdir/file1, your program should print file1 contents, and the HTML boolean should be false.

Here’s our code:

int main(int argc, char *argv[]) {
    if (argc > 2) {
        cerr << "Usage: " << argv[0] << " [<port>]" << endl;
        return kWrongArgumentCount;
    }
    unsigned short port = extractPort(argv[1]);
    if (port == USHRT_MAX) {
        cerr << "Invalid port number specified" << endl;
        return kIllegalPortArgument;
    }
    int server = createServerSocket(port);
    if (server == -1) {
        cerr << "Failed to bind to port " << port << endl;
        return kServerStartFailure;
    }
    cout << "Server listening on port " << port << "." << endl;

    while (true) {
        struct sockaddr_in address;
        // used to surface ip address from the client
        socklen_t size = sizeof(address);
        bzero(&address, size);
        int client = accept(server, (struct sockaddr *)&address, &size);
        char str[INET_ADDRSTRLEN];
        cout << "Received a connection request from "
            << inet_ntop(AF_INET, &address.sin_addr, str, INET_ADDRSTRLEN)
            << "." << endl;
        serveFile(client);
    }
    return 0;
}

static void serveFile(int client) {
    sockbuf sb(client);
    iosockstream ss(&sb);

    string fileName = getFilename(ss);
    skipHeaders(ss);

    pair<string, bool> contents = loadPath(fileName);
}

static string getFilename(iosockstream& ss) {
    string method, path, protocol;
    ss >> method >> path >> protocol;
    string rest;
    getline(ss, rest);
    cout << "\tPath requested: " << path << endl;
    if (path == "/") {
        // serve current directory
        return (".");
    }
    size_t pos = path.find("/");
    return pos == string::npos ? path : path.substr(pos + 1);
}

static void skipHeaders(iosockstream& ss) {
  string line;
  do {
    getline(ss, line);
  } while (!line.empty() && line != "\r");
}

Questions for discussion:

Can you think of any security vulnerabilities that might arise with the way we are handling provided paths? Could an attacker specify a path in such a way that they could read arbitrary files from our computer? For example, if the course staff were running this program on myth, is there a way that you could send a request to this server to read the sample solutions for the assignments?
- A malicious client could pass in an absolute file path that starts with two slashes, such as //usr/class/cs110/, or (worse), //afs/ir/users/y/h/yourHome, and this server does not protect against any non-desired access. Once your server is accessible to the network, anyone on the network can access it.

Part 5: Serving the file to the client

Now that you have the file contents, let’s send this back to the client as an HTTP response. Your response should include a Content-Length header specifying the size of the file contents, as well as a Content-Type header that is text/html; charset=UTF-8 if the file is HTML or text/plain; charset=UTF-8 otherwise.

You should now have a working HTTP server! Navigate to http://mythXX.stanford.edu:portNum/samples/cs110/ to see your server in action.

If you encounter problems, try running curl -vv http://mythXX.stanford.edu:portNum/samples/cs110/ to see what is being sent to your server and what is being received. Alternatively, try nc mythXX portNum and manually type in an HTTP request, then see the response sent back by your server.

Here’s our code:

int main(int argc, char *argv[]) {
    if (argc > 2) {
        cerr << "Usage: " << argv[0] << " [<port>]" << endl;
        return kWrongArgumentCount;
    }
    unsigned short port = extractPort(argv[1]);
    if (port == USHRT_MAX) {
        cerr << "Invalid port number specified" << endl;
        return kIllegalPortArgument;
    }
    int server = createServerSocket(port);
    if (server == -1) {
        cerr << "Failed to bind to port " << port << endl;
        return kServerStartFailure;
    }
    cout << "Server listening on port " << port << "." << endl;

    while (true) {
        struct sockaddr_in address;
        // used to surface ip address from the client
        socklen_t size = sizeof(address);
        bzero(&address, size);
        int client = accept(server, (struct sockaddr *)&address, &size);
        char str[INET_ADDRSTRLEN];
        cout << "Received a connection request from "
            << inet_ntop(AF_INET, &address.sin_addr, str, INET_ADDRSTRLEN)
            << "." << endl;
        serveFile(client);
    }
    return 0;
}

static void serveFile(int client) {
    sockbuf sb(client);
    iosockstream ss(&sb);

    string fileName = getFilename(ss);
    skipHeaders(ss);

    pair<string, bool> contents = loadPath(fileName);
    sendResponse(ss, contents.first, contents.second);
}

static void sendResponse(iosockstream& ss, const string& payload, bool isHTML) {
    ss << "HTTP/1.1 200 OK\r\n";
    if (isHTML) {
        ss << "Content-Type: text/html; charset=UTF-8\r\n";
    } else {
        ss << "Content-Type: text/plain; charset=UTF-8\r\n";
    }
    ss << "Content-Length: " << payload.size() << "\r\n";
    ss << "\r\n";
    ss << payload << flush;
}

Questions for discussion:

A well-intentioned person might write some code that is decomposed like so:

int main() {
    ...
    int fd = accept(...);
    string path = getPath(fd);
    pair<string, bool> contents = loadPath(path);
    sendResponse(fd, contents);
    ...
}
  
string getPath(int fd) {
    sockbuf sb(fd);
    iosockstream ss(&sb);
    // use ss to read the path from the client...
    ...
}
  
void sendResponse(int fd, pair<string, bool> contents) {
    sockbuf sb(fd);
    iosockstream ss(&sb);
    // send `contents` to the client...
    ...
}

Why doesn’t this decomposition work? How might you decompose instead?

In getPath, the sockbuf takes ownership of the file descriptor (connected to the client) and closes the file descriptor when it goes out of scope. Then, in sendResponse, fd is already closed and can no longer be used to send anything to the client.

It’s a good idea to ss << flush after writing the response to the iosockstream. Why is this important? Is it absolutely necessary in our case, or could we have gotten by without it?
- Similar to writing to cout, writing to an iosockstream does not immediately send any data to the client. Instead, the data is accumulated in the sockbuf and periodically flushed to the client. This helps to improve performance, but it is good to force a flush when we’re done sending a response so that it is immediately sent to the client.
  
  This is not strictly necessary here, since the buffer will be flushed anyways once the sockbuf goes out of scope (i.e. at the end of sendResponse in our solution). However, this is still good practice, and it becomes much more important in more complicated situations (e.g. when there are multiple requests/responses in the same connection), and is crucial in Assignment 6.

Part 6: Adding threading

Let’s speed up your server! Add a ThreadPool to your code to serve files to up to 16 clients at a time.

Here’s our code:

int main(int argc, char *argv[]) {
    if (argc > 2) {
        cerr << "Usage: " << argv[0] << " [<port>]" << endl;
        return kWrongArgumentCount;
    }
    unsigned short port = extractPort(argv[1]);
    if (port == USHRT_MAX) {
        cerr << "Invalid port number specified" << endl;
        return kIllegalPortArgument;
    }
    int server = createServerSocket(port);
    if (server == -1) {
        cerr << "Failed to bind to port " << port << endl;
        return kServerStartFailure;
    }
    cout << "Server listening on port " << port << "." << endl;

    ThreadPool pool(16);
    while (true) {
        struct sockaddr_in address;
        // used to surface ip address from the client
        socklen_t size = sizeof(address);
        bzero(&address, size);
        int client = accept(server, (struct sockaddr *)&address, &size);
        char str[INET_ADDRSTRLEN];
        cout << "Received a connection request from "
            << inet_ntop(AF_INET, &address.sin_addr, str, INET_ADDRSTRLEN)
            << "." << endl;
        pool.schedule([client] {
            serveFile(client);
        });
    }
    return 0;
}

Questions for discussion:

What synchronization primitives did you need to add to ensure thread safety for your code?
- None! The thread must capture the client file descriptor by value, but once it does that, it isn’t really sharing any data structures with other threads, and each conversation with each client is independent of any other conversations, so there is nothing to synchronize here.

CS 110

Lab 7 Solutions

Problem 1: Networking short-answer questions

Problem 2: Implementing a basic web server

Part 1: Starting the server

Part 2: Handling connections

Part 3: Reading the request

Part 4: Loading the requested file

Part 5: Serving the file to the client

Part 6: Adding threading