Lecture 15: Networking with threading, protocols
Note: Reading these lecture notes is not a substitute for watching the lecture. I frequently go off script, and you are responsible for understanding everything I talk about in lecture unless I specify otherwise.
Implementing an echo
server
Let’s implement a slightly more sophisticated server. This one will read from the client, then echo back to the client whatever was received:
static void echo(int clientSocket, size_t connId) {
cout << "Handling incoming connection " << connId << endl;
sockbuf sb(clientSocket);
iosockstream ss(&sb);
ss << "Hello, client " << connId << "!" << endl;
while (true) {
string line;
getline(ss, line);
if (ss.eof() || ss.fail()) {
break;
}
ss << "\t" << line << endl;
}
cout << "Connection " << connId << " closed" << endl;
}
int main(int argc, char *argv[]) {
int serverSocket = createServerSocket(12345);
if (serverSocket < 0) {
cout << "Error: could not start server" << endl;
return 1;
}
size_t connCount = 0;
while (true) {
int clientSocket = accept(serverSocket, NULL, NULL);
size_t connId = connCount++;
echo(clientSocket, connId);
}
return 0;
}
Adding threads
This server works, but it’s only able to talk to one client at a time. Since one client might take several seconds (or maybe even minutes) to have a conversation, this is terrible for performance. The server isn’t even using much system resources, since it’s spending almost all of its time waiting to hear from the client, so we might as well try to talk to multiple clients at a time.
It’s very common to do this using a ThreadPool. We create some fixed number of threads, then distribute our convesations amongst those threads:
static void echo(int clientFd, size_t connId) {
cout << oslock << "Handling incoming connection " << connId << endl << osunlock;
sockbuf sb(clientFd);
iosockstream ss(&sb);
ss << "Hello, client " << connId << "!" << endl;
while (true) {
string line;
getline(ss, line);
if (ss.eof() || ss.fail()) {
break;
}
ss << "\t" << line << endl;
}
cout << oslock << "Connection " << connId << " closed" << endl << osunlock;
}
int main(int argc, char *argv[]) {
int waitingListFd = createServerSocket(12345);
if (waitingListFd == -1) {
cerr << "Failed to bind to port 12345" << endl;
return 1;
}
size_t connCount = 0;
ThreadPool pool(16);
while (true) {
int clientFd = accept(waitingListFd, NULL, NULL);
size_t connId = connCount++;
pool.schedule([clientFd, connId]{
echo(clientFd, connId);
});
}
return 0;
}
Thread safety
Whenever adding threads to any application, you must take care to ensure that
the functions you call are safe to call from multiple threads at the same time.
Several C standard library functions use global variables in their
implementations; for example, gethostbyname()
(introduced last lecture, does
a DNS lookup to get the IP address for a domain name) returns a pointer into
global memory. This is convenient, since we don’t need to worry about freeing
that memory, but it also means that gethostbyname()
is not thread safe, since
thread 1 might be in the middle of using that global memory when thread 2 calls
gethostbyname()
and overwrites it. These functions usually have “reentrant”
thread-safe versions that are more complicated to use, but do not use any
global memory, and are safe to call from multiple threads at the same time. For
example, gethostbyname_r
allows you to pass a buffer that is used to store the DNS lookup results, so
that no global memory is used, and data races can be avoided.
Implementing a client
We can implement a really simple client that connects to a server, receives a line of data, and prints that out:
#include <string>
#include "socket++/sockstream.h" // for sockbuf, iosockstream
#include "client-socket.h"
using namespace std;
int main(int argc, char *argv[]) {
// Connect to the server
int sock = createClientSocket("myth55.stanford.edu", 12345);
if (sock < 0) {
cout << "Error establishing connection!" << endl;
return 1;
}
// Create a sockstream to make it easier to work with the fd
sockbuf sb(sock);
iosockstream ss(&sb);
// Read a line from the server
string line;
getline(ss, line);
// Print
cout << line << endl;
// sockbuf destructor will close the sock file descriptor
}
Protocols
We have been talking about how to open connections between two computers. When we do this, we get a socket on each end, where if we write to the socket, that data is sent to the other side, and if we read from the socket, we receive any data from the other side. With these sockets, we could really send/receive anything (since we can send/receive arbitrary text/binary)… but how should we format this text? How should we apply this tool?
A protocol is a formal specification of how two computers should talk to each other, intended to ensure that computers can understand each other when talking over the internet, even if they are running different software written by different people, running in different environments with different hardware.
Protocols are often codified in RFCs; for example, the specification for HTTP 1.1 (the protocol we’ll focus on in this class) is here. It’s quite long and you don’t need to read it, but if you were to implement software that follows this specification, you would be able to talk to any other computer anywhere that also speaks HTTP 1.1.
HTTP
The HTTP protocol is a somewhat universal language in the world of networking. Since so many programs can speak it, it’s a very common language/protocol for exchanging information and executing commands over network connections.
Once a connection is open between the client and server, the client sends a request, and then the server sends a response. The client and server can go back-and-forth several times, sending several requests and responses over the same connection, but to keep things simple for this class, we will only look at cases where the client sends one request, and the server replies with one response.
HTTP requests
An HTTP request (sent by a client to a server) looks something like the following:
GET /search?q=cats&tbm=isch HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:61.0) Gecko/20100101 Firefox/61.0
Accept-Language: en-US,en;q=0.5
The first line is called the start line or request line:
- The first part of the request line is a verb. HTTP has many verbs as part
of the language, but you should know the following:
GET
: asks a server for some information, but doesn’t modify any state on the server.POST
: sends some info to the server, modifying state. Logging into a website sends aPOST
request; you’re sending your username and password and creating a login session on that server. Uploading an image to Google Drive sends aPOST
request; you’re asking the server to store your image.HEAD
: the exact same as aGET
request, but tells the server to only send metadata and omit the actual contents of the response. Wtf, why would anyone do that? This is super helpful for caching information. When many servers send you files, they also send some metadata specifying when the file last changed. You can save thatLast-Modified
timestamp, and then next time you consider downloading that file, you can send aHEAD
request instead of aGET
request. The server responds with only the metadata headers, and omits the contents of the file. IfLast-Modified
is unchanged, then you know the file you downloaded before is up to date, and you don’t need to download it again. If it is newer, then you can make a followupGET
request to download the updated file.
- The second part of the request line is the request path. The format of this
path is dictated by the web server. This particular path tells Google
that I want to
search
, that I am looking for cats (q=cats
, whereq
is short forquery
), and that I want to search Google images (tbm=isch
– I don’t know whattbm
stands for, but I assumeisch
stands fori
mages
earch
). - Finally, the last part of the request line specifies what version of the HTTP protocol we want to speak. HTTP/2.0 is becoming widespread as of just this year, and it has many exciting features. In this class, we’ll only speak HTTP/1.0.
Following the request line are several lines containing request headers. This
specifies metadata about the request. Headers are key/value pairs, written as
Key: Value
, with each pair on a separate line. There are many standard
headers, although a program can add any extra, non-standard headers if it
likes. In the sample header above (which was actually sent by my browser to
Google Images), my browser is telling Google that it is Firefox (through the
User-Agent
header) and that I would like my content in American English
(through the Accept-Language
header).
Finally, a blank line is used to designate the end of the headers section.
If we want to send any payload with our request (e.g. to upload an image to
Google Drive), then we can send the payload after the blank line. This
typically only happens with POST
requests. For example, a request to log into
a website might look something like this:
POST /login HTTP/1.1
Host: mysecretwebsite.com
Content-Length: 41
username=ryan&password=verysecretpassword
HTTP responses
An HTTP response looks like this:
HTTP/1.1 200 OK
Date: Mon, 09 Aug 2021 22:48:07 GMT
Server: Apache
Accept-Ranges: bytes
Content-Length: 21483
Content-Type: text/html
<html>
<head>
<meta name="generator" content="Hugo 0.67.1" />
<meta charset="UTF-8" />
<link href="https://fonts.googleapis.com/css?family=Merriweather|Open+Sans|Anonymous+Pro" rel="stylesheet">
<link rel="stylesheet" type="text/css" href="//web.stanford.edu/class/cs110/summer-2021//review.css" />
<link rel="stylesheet" type="text/css" href="//web.stanford.edu/class/cs110/summer-2021//codemirror.css" />
<link rel="stylesheet" type="text/css" href="//web.stanford.edu/class/cs110/summer-2021//style.css" />
<title>CS 110: Principles of Computer Systems</title>
</head>
....
The start line specifies the HTTP version that the server can speak, as well as
an HTTP status code. (I’m sure you’ve seen status code 404 Not Found
.)
Response headers follow (e.g. Content-Type
indicates that we’ve been sent
an HTML page; other common headers include Set-Cookie
, used to send browser
cookies, and Cache-Control
, which tells the browser whether to cache the
page). A blank line is included to indicate the end of the headers, and then
the server sends the payload.
Newlines in HTTP
You’re used to seeing the \n
(LF – line feed) character used to terminate
lines. This is the standard way of representing newlines in Unix, but
Windows-derived systems typically terminate lines using \r\n
(CRLF –
carriage return, line feed). We want all of these systems to interoperate
happily on the internet, so we need to decide a standard way to do things. The
designers of HTTP decided to use \r\n
(CRLF) to terminate lines, so the
request line, headers, and terminating blank line should all end with \r\n
.
In practice, most servers and clients will tolerate lines ending with \n
, but
it’s a good idea to send \r\n
to be fully compliant with the HTTP
specification.
Implementing an HTTP client
curl
is a program included on many systems that makes an HTTP request and
prints the payload that the server responds with. We’ll write a super basic
version that sends an HTTP request in the format given above, then takes the
response, ignores the response line and headers, and prints the payload to the
terminal.
int main(int argc, char *argv[]) {
if (argc != 2) {
cout << "Usage: " << argv[0] << " urlToDownload" << endl;
exit(1);
}
string url = argv[1];
// The inputted `url` will be something like
// "http://web.stanford.edu/class/cs110/". We need to connect to the server
// "web.stanford.edu", and we'll need to send a request for the path
// "/class/cs110/".
pair<string, string> parsed = parseURL(url);
string host = parsed.first; // e.g. "web.stanford.edu"
string path = parsed.second; // e.g. "/class/cs110/"
// Open a connection to the server
int fd = createClientSocket(host, 80);
// Wrap the file descriptor in an iosockstream to make it easier to
// send/receive stuff using C++-isms
sockbuf sb(fd);
iosockstream serverSS(&sb);
sendRequest(serverSS, host, path);
skipToPayload(serverSS);
printPayload(serverSS);
return 0;
}
/**
* Given a url like "http://web.stanford.edu/class/cs110", return the pair
* ("web.stanford.edu", "/class/cs110").
*
* This is just a bunch of annoying string parsing work. Use a library function
* to do this whenever possible.
*/
static pair<string, string> parseURL(string url) {
if (url.substr(0, 7) == "http://") {
// Chop off the http:// from the beginning of the string
url = url.substr(7);
size_t found = url.find("/");
if (found == string::npos) {
// There is no slash in the url, so the url is just the host (e.g.
// www.google.com)
return make_pair(url, "/");
}
string hostName = url.substr(0, found);
string path = url.substr(found);
return make_pair(hostName, path);
} else {
throw "invalid proto!";
}
}
/**
* Send an HTTP GET request to the server, requesting the specified path and
* hostname.
*/
static void sendRequest(iosockstream& ss, string hostName, string path) {
ss << "GET " << path << " HTTP/1.0\r\n";
ss << "Host: " << hostName << "\r\n";
ss << "\r\n";
// Important: make sure that the request is fully flushed to the network
// and sent to the server. (You may know that cout is buffered, and print
// statements aren't guaranteed to immediately show up on the terminal;
// similarly, the sockstream is buffered, so we need to flush to send to
// the network.)
ss.flush();
}
/**
* Ignore the response line and the headers that the server sends to us. This
* is usually not a good thing to do (e.g. the server might have replied with
* an error, and we want to show that), but it's okay for this very simple
* demo.
*/
static void skipToPayload(iosockstream& ss) {
// Keep reading lines from the connection until we see "\r\n", indicating
// we've finished reading all the headers
string line;
do {
getline(ss, line);
} while (!line.empty() && line != "\r");
}
/**
* Read from the network connection until the server closes the connection,
* printing any received data to the terminal
*/
static void printPayload(iosockstream& ss) {
size_t totalBytes = 0;
while (ss.good()) {
char buffer[1024] = {'\0'};
ss.read(buffer, sizeof(buffer));
totalBytes += ss.gcount();
cout << string(buffer, ss.gcount());
}
cout << endl << "Total number of bytes fetched: " << totalBytes << endl;
}
Sample output:
🍉 ./curl http://web.stanford.edu/class/cs110/summer-2021/ | head -n10
<html>
<head>
<meta name="generator" content="Hugo 0.67.1" />
<meta charset="UTF-8" />
<link href="https://fonts.googleapis.com/css?family=Merriweather|Open+Sans|Anonymous+Pro" rel="stylesheet">
<link rel="stylesheet" type="text/css" href="//web.stanford.edu/class/cs110/summer-2021//review.css" />
<link rel="stylesheet" type="text/css" href="//web.stanford.edu/class/cs110/summer-2021//codemirror.css" />
<link rel="stylesheet" type="text/css" href="//web.stanford.edu/class/cs110/summer-2021//style.css" />
<title>CS 110: Principles of Computer Systems</title>
</head>