Extra Credit Assignment: HTTPS proxy
Your Assignment 5 proxy
is only capable of proxying websites starting with
http://
. These websites use the HTTP protocol, with the requests and
responses sent in “cleartext”: there is no encryption used at the HTTP level,
and you can type HTTP by hand into netcat
and it works.
Websites starting with https://
also speak the HTTP protocol, but they do so
on top of an encrypted layer called TLS that provides security and
confidentiality. (HTTPS stands for “HTTP secure”.) All of the exchanged HTTP
communication is encrypted during transmission. This makes life a little bit
harder for your proxy, since it cannot see any of the requests or responses
being made.
Many people would still like to be able to proxy HTTPS traffic somehow, so there are some features built into the protocol that allow us to do this.
Logistics
You can earn up to 20 points for completing this extension. This is completely optional, and I will exclude the extra credit from any curve calculations I do at the end of the quarter, so that you don’t feel any pressure at all to complete it.
This assignment is due Wednesday, August 15th at 11:59pm, and we will not accept late submissions at all. Your code will be graded soley on functionality; we won’t do any style grading here.
Since this is an extra credit assignment, we’ll be deprioritizing questions about this on Piazza and in office hours. We’ll still try to help you when we’re free!
Getting started
The starter code is identical to the starter code from assign5
. You can copy
over all your files, but we ask that you submit this as a separate
assign5
. You should clone the starter code and copy over your assign5
code by doing this:
git clone /usr/class/cs110/repos/assign7/$USER assign7
cp assign5/*.cc assign5/*.h assign7/
Important!! In request.cc
ingestRequestLine
, this line removes the
protocol prefix from a request:
server.erase(0, kProtocolPrefix.size());
Because CONNECT
requests (described below) don’t include a prefix, you need
to update the line as follows:
if (pos != string::npos) server.erase(0, kProtocolPrefix.size());
Submitting works the same as usual. I have a working sample solution in the
samples
directory.
Proxying HTTPS traffic
When you visit https://www.nytimes.com
in your browser with an HTTP proxy
configured, your browser will send a CONNECT
request to your proxy server:
CONNECT www.nytimes.com:443 HTTP/1/1
Host: www.nytimes.com
<zero or more additional key-value pairs>
<blank line>
(Note that 443 is the standard port number for HTTPS traffic. You’ll want to
connect to the destination server on the port specified in the CONNECT
request.)
Your proxy server should open a network connection to the destination server, then respond to the client:
HTTP/1.1 200 OK
<blank line>
Note that nothing is actually sent to the destination server yet. You
have a connection open to the destination server (i.e. createClientSocket
returns a socket successfully), but you are not forwarding the CONNECT
request or sending it anything else just yet.
If this were vanilla HTTP, your proxy server would close the connection to the client at this point, having sent a response. However, because this is proxying an HTTPS connection, you should instead keep both sockets open (to the client and to the destination server), and create a bidirectional bridge between them: any bytes that the client sends should be forwarded to the server, and any bytes that the server sends should be forwarded back to the client. Note that there will be a lot of back-and-forth communication over this bridge. In your simple HTTP scheme, the client sends a request, the server sends a response, and then you’re finished, but speaking HTTPS involves several negotiations between client and server, so you can’t assume that the client is done sending data once the server starts sending a response.
Implementing this assignment
When you accept the CONNECT
request and successfully open a network
connection to the server, you will have two sockets that you need to bridge.
Whenever data comes in on one socket, write it out to the other. You can use
the epoll
functions to sleep until data is available on one of the sockets;
then, read from the socket into a buffer and write it back out to the other
socket. You only need to wake up when data is available to read on one of the
sockets, so EPOLLIN
is sufficient. Level triggering will be easiest to code
up.
You do not need to configure either socket as non-blocking, and I
recommend that you don’t do so (though you’re welcome to if you want). Making
the sockets non-blocking makes things significantly more complicated (you need
several buffers to keep track of what you have read but not yet successfully
written, and there is a lot of state management to do), and it doesn’t help
very much: since only the client or the server will be writing at any point
in time, it’s okay to block until you’ve proxied the entire client’s message or
the entire server’s message. (The reason for using epoll
is that at any point
in time, you don’t know if the client or server is about to speak next, but you
can be pretty sure they won’t both speak at the same time.)
You can stop proxying the request once one of the sockets gets closed (by the client or destination server), or once a 30 second timeout has elapsed without seeing any data on either socket. When one of the sockets is closed, make sure to finish proxying data before closing the other socket (e.g. if the server closes the connection, make sure to finish sending the client any leftover bytes the server sent you before you close the socket to the client).
Tips:
In your Firefox proxy settings, be sure to update “SSL Proxy” to match your “HTTP Proxy” settings. Otherwise, Firefox won’t even attempt to use your proxy for HTTPS websites.
If you have been testing using
curl
, you can do this (update the myth and port numbers):$ export HTTPS_PROXY=http://myth54.stanford.edu:9979/ $ curl https://www.google.com/
You’ll be creating an
epoll
watch set for every HTTPS connection. Don’t forget to close the watch set file descriptor returned byepoll_create
!You can get the file descriptor that an
iosockstream
is connected to asint fd = ss.rdbuf()->sd()
.You’ll need to use several
iostream
methods (note thatiosockstream
is a child class ofiostream
).readsome
,eof
,fail
,gcount
, andflush
may be helpful. If you’d rather do this with rawread
orwrite
syscalls, we won’t stop you, but make sure you remember thatwrite
may not finish writing a buffer to a socket in one go, and you might need to call it multiple times.- This Piazza post might be helpful.
Make sure you aren’t leaking any file descriptors over the course of operation. You can list the file descriptors that a process has open by doing
ls /proc/<pid>/fd
.You don’t need to implement caching (you really can’t, give the encryption). Blacklisting and parallel handling of requests should still work.