Extra Credit Assignment: HTTPS proxy

Your Assignment 5 proxy is only capable of proxying websites starting with http://. These websites use the HTTP protocol, with the requests and responses sent in “cleartext”: there is no encryption used at the HTTP level, and you can type HTTP by hand into netcat and it works.

Websites starting with https:// also speak the HTTP protocol, but they do so on top of an encrypted layer called TLS that provides security and confidentiality. (HTTPS stands for “HTTP secure”.) All of the exchanged HTTP communication is encrypted during transmission. This makes life a little bit harder for your proxy, since it cannot see any of the requests or responses being made.

Many people would still like to be able to proxy HTTPS traffic somehow, so there are some features built into the protocol that allow us to do this.

Logistics

You can earn up to 20 points for completing this extension. This is completely optional, and I will exclude the extra credit from any curve calculations I do at the end of the quarter, so that you don’t feel any pressure at all to complete it.

This assignment is due Wednesday, August 15th at 11:59pm, and we will not accept late submissions at all. Your code will be graded soley on functionality; we won’t do any style grading here.

Since this is an extra credit assignment, we’ll be deprioritizing questions about this on Piazza and in office hours. We’ll still try to help you when we’re free!

Getting started

The starter code is identical to the starter code from assign5. You can copy over all your files, but we ask that you submit this as a separate assign5. You should clone the starter code and copy over your assign5 code by doing this:

git clone /usr/class/cs110/repos/assign7/$USER assign7
cp assign5/*.cc assign5/*.h assign7/

Important!! In request.cc ingestRequestLine, this line removes the protocol prefix from a request:

server.erase(0, kProtocolPrefix.size());

Because CONNECT requests (described below) don’t include a prefix, you need to update the line as follows:

if (pos != string::npos) server.erase(0, kProtocolPrefix.size());

Submitting works the same as usual. I have a working sample solution in the samples directory.

Proxying HTTPS traffic

When you visit https://www.nytimes.com in your browser with an HTTP proxy configured, your browser will send a CONNECT request to your proxy server:

CONNECT www.nytimes.com:443 HTTP/1/1
Host: www.nytimes.com
<zero or more additional key-value pairs>
<blank line>

(Note that 443 is the standard port number for HTTPS traffic. You’ll want to connect to the destination server on the port specified in the CONNECT request.)

Your proxy server should open a network connection to the destination server, then respond to the client:

HTTP/1.1 200 OK
<blank line>

Note that nothing is actually sent to the destination server yet. You have a connection open to the destination server (i.e. createClientSocket returns a socket successfully), but you are not forwarding the CONNECT request or sending it anything else just yet.

If this were vanilla HTTP, your proxy server would close the connection to the client at this point, having sent a response. However, because this is proxying an HTTPS connection, you should instead keep both sockets open (to the client and to the destination server), and create a bidirectional bridge between them: any bytes that the client sends should be forwarded to the server, and any bytes that the server sends should be forwarded back to the client. Note that there will be a lot of back-and-forth communication over this bridge. In your simple HTTP scheme, the client sends a request, the server sends a response, and then you’re finished, but speaking HTTPS involves several negotiations between client and server, so you can’t assume that the client is done sending data once the server starts sending a response.

Implementing this assignment

When you accept the CONNECT request and successfully open a network connection to the server, you will have two sockets that you need to bridge. Whenever data comes in on one socket, write it out to the other. You can use the epoll functions to sleep until data is available on one of the sockets; then, read from the socket into a buffer and write it back out to the other socket. You only need to wake up when data is available to read on one of the sockets, so EPOLLIN is sufficient. Level triggering will be easiest to code up.

You do not need to configure either socket as non-blocking, and I recommend that you don’t do so (though you’re welcome to if you want). Making the sockets non-blocking makes things significantly more complicated (you need several buffers to keep track of what you have read but not yet successfully written, and there is a lot of state management to do), and it doesn’t help very much: since only the client or the server will be writing at any point in time, it’s okay to block until you’ve proxied the entire client’s message or the entire server’s message. (The reason for using epoll is that at any point in time, you don’t know if the client or server is about to speak next, but you can be pretty sure they won’t both speak at the same time.)

You can stop proxying the request once one of the sockets gets closed (by the client or destination server), or once a 30 second timeout has elapsed without seeing any data on either socket. When one of the sockets is closed, make sure to finish proxying data before closing the other socket (e.g. if the server closes the connection, make sure to finish sending the client any leftover bytes the server sent you before you close the socket to the client).

Tips:

In your Firefox proxy settings, be sure to update “SSL Proxy” to match your “HTTP Proxy” settings. Otherwise, Firefox won’t even attempt to use your proxy for HTTPS websites.
- If you have been testing using curl, you can do this (update the myth and port numbers):
```
$ export HTTPS_PROXY=http://myth54.stanford.edu:9979/
$ curl https://www.google.com/
```
You’ll be creating an epoll watch set for every HTTPS connection. Don’t forget to close the watch set file descriptor returned by epoll_create!
You can get the file descriptor that an iosockstream is connected to as int fd = ss.rdbuf()->sd().
You’ll need to use several iostream methods (note that iosockstream is a child class of iostream). readsome, eof, fail, gcount, and flush may be helpful. If you’d rather do this with raw read or write syscalls, we won’t stop you, but make sure you remember that write may not finish writing a buffer to a socket in one go, and you might need to call it multiple times.
- This Piazza post might be helpful.
Make sure you aren’t leaking any file descriptors over the course of operation. You can list the file descriptors that a process has open by doing ls /proc/<pid>/fd.
You don’t need to implement caching (you really can’t, give the encryption). Blacklisting and parallel handling of requests should still work.

CS 110

Extra Credit Assignment: HTTPS proxy

Logistics

Getting started

Proxying HTTPS traffic

Implementing this assignment