Lecture 11: Networking continued
Note: Reading these lecture notes is not a substitute for watching the lecture. I frequently go off script, and you are responsible for understanding everything I talk about in lecture unless I specify otherwise.
Recap of writing servers: echo
server
We can write this program that reads a line from a client and repeats it right
back to the client. Unlike the similar “hello world” example from last class,
this uses the C++ sockbuf
and iosockstream
classes.
static void echo(int clientSocket, size_t connId) {
cout << oslock << "Accepted incoming connection " << connId << endl << osunlock;
sockbuf sb(clientSocket);
iosockstream ss(&sb);
ss << "Hello, client!" << endl;
while (!ss.fail()) {
string line;
getline(ss, line);
ss << "\t" << line << endl;
}
cout << "Connection " << connId << " closed" << endl;
}
int main(int argc, char *argv[]) {
int serverSocket = createServerSocket(12345);
if (serverSocket < 0) {
cout << "Error: could not start server" << endl;
return 1;
}
size_t connCount = 0;
while (true) {
int clientSocket = accept(serverSocket, NULL, NULL);
size_t connId = connCount++;
echo(clientSocket, connId);
}
return 0;
}
Thread pools
This doesn’t perform very well, however; it can only maintain a conversation with a single client at a time. If two people connect to the server, the second person has to wait until the first person closes his connection and the server shifts its attention to the second person.
That’s easy to address with threading. In main
, we can replace the direct
echo
call with a call to create a thread
:
thread(echo, clientSocket, connId).detach();
We’ve seen this idea many times. However, if we do this, our server will waste a lot of effort creating a brand new thread every time a connection comes in. Even if we use a semaphore to limit the total number of running threads to 16 threads at a time (as we did with the time server last class), if this server runs for a day, we might have launched 1M threads over the course of that day.
We can do better. A thread pool is a team of threads that work together to
process work from a queue, very similar to the setup in farm
. In farm
, you
didn’t spawn a new worker for every single number you needed to factor – that
would have been a waste. Instead, you handed each number to an available worker
when it finished the last number it was factoring. We want to do something
similar here.
You’ll implement a ThreadPool
in Assignment 4. For now, we’ll use an
implementation that we wrote for you.
The ThreadPool
constructor takes an integer, which is the number of workers
to launch as part of the pool. Once we have a pool, we can call
ThreadPool::schedule(func)
to add a function func
to the queue; when a
worker is available, it will take the function from the queue and execute it.
(Note: the schedule
function accepts pointers to functions with no arguments
and no return value – these are called “thunks.” I don’t know why.) At a later
point, we can call ThreadPool::wait()
to wait for everything in the queue to
be executed.
Using a ThreadPool
:
int main(int argc, char *argv[]) {
int serverSocket = createServerSocket(12345);
if (serverSocket < 0) {
cout << "Error: could not start server" << endl;
return 1;
}
size_t connCount = 0;
ThreadPool tp(16);
while (true) {
int clientSocket = accept(serverSocket, NULL, NULL);
size_t connId = connCount++;
tp.schedule([clientSocket, connId](){
echo(clientSocket, connId);
});
}
return 0;
}
Network clients
Writing clients is similar to writing servers. Calling createClientSocket
(which we’ll write on Monday) returns a file descriptor that can be used to
talk to a server. Anything written to the file descriptor goes to the server,
and anything read comes from the server.
This program reads one line from a server and prints the response. We can use it to get the time from our time server.
int main(int argc, char *argv[]) {
int sock = createClientSocket("myth55.stanford.edu", 12345);
if (sock < 0) {
cout << "Error establishing connection!" << endl;
return 1;
}
sockbuf sb(sock);
iosockstream ss(&sb);
string line;
getline(ss, line);
cout << line << endl;
}
HTTP protocol
We have this communication scheme that, much like pipes, can be used to enable freeform communication between two computers on the internet. If we want to set a system up so that one computer can ask another computer to do something for it, we could have a computer send the server requests in the following format:
EXECUTE getImages category:cats numToReturn:10
The server can parse that request format and send back the images over the network connection.
However, if we want to use the network to do something else, we might have to define a new communication protocol and a new parsing format for that use. Why keep defining custom protocols when we can use a standard one?
Learning English is useful because so many people speak it. Similarly, HTTP is a somewhat universal language in the world of network protocols. Since so many programs know how to speak it, it’s a very common language/protocol for exchanging information and executing commands over network connections.
An HTTP request (sent by a client to a server) looks something like the following:
GET /search?q=cats&tbm=isch HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:61.0) Gecko/20100101 Firefox/61.0
Accept-Language: en-US,en;q=0.5
The first line is called the start line or request line. The first part of
the request line is a verb. HTTP has many verbs as part of the language, but
you should know the following:
* GET
: asks a server for some information, but doesn’t modify any state on
the server.
* POST
: sends some info to the server, modifying state. Logging into a
website sends a POST
request; you’re sending your username and password and
creating a login session on that server. Uploading an image to Google Drive
sends a POST
request; you’re asking the server to store your image.
* HEAD
: the exact same as a GET
request, but tells the server to only send
metadata and omit the actual contents of the response. Wtf, why would anyone
do that? This is super helpful for caching information. When many servers
send you files, they also send some metadata specifying when the file last
changed. You can save that Last-Modified
timestamp, and then next time you
consider downloading that file, you can send a HEAD
request instead of a
GET
request. The server responds with only the metadata headers, and
omits the contents of the file. If Last-Modified
is unchanged, then you
know the file you downloaded before is up to date, and you don’t need to
download it again. If it is newer, then you can make a followup GET
request
to download the updated file.
The second part of the request line is the request path. The format of this
path can be customized by the web server. This particular path tells Google
that I want to search
, that I am looking for cats (q=cats
, where q
is
short for query
), and that I want to search Google images (tbm=isch
– I
don’t know what tbm
stands for, but I assume isch
stands for i
mage
s
earch
).
Finally, the last part of the request line specifies what version of the HTTP protocol we want to speak. HTTP/2.0 is becoming widespread as of just this year, and it has many exciting features. In this class, we’ll only speak HTTP/1.0.
Following the request line are several lines containing request headers. This
specifies metadata about the request. In the sample header above (which was
actually sent by my browser to Google Images), my browser is telling Google
that it is Firefox (through the User-Agent
header) and that I would like my
content in American English (through the Accept-Language
header).
If we want to send any payload with our request (e.g. to upload an image to Google Drive), then we add a newline after the headers, and send our payload. A request is terminated by an extra blank line.
An HTTP response looks like this:
HTTP/2.0 200 OK
content-type: text/html; charset=UTF-8
date: Fri, 27 Jul 2018 21:21:52 GMT
expires: -1
cache-control: private, max-age=0
<!doctype html>
<html itemscope="" itemtype="http://schema.org/SearchResultsPage" lang="en">
<head>
<meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image">
<link href="/images/branding/product/ico/googleg_lodp.ico" rel="shortcut icon">
<meta content="origin" name="referrer">
<title>cats - Google Search</title>
<script nonce="DaArjK0SZHtb0mDYFCfFWA==">
...
The start line specifies the HTTP version that the server can speak, as well as
an HTTP status code. (I’m sure you’ve seen status code 404 Not Found
.)
Response headers follow (Content-Type
indicates that Google Images has sent
us an HTML page, Cache-Control
tells the browser not to cache this response,
and there are many more headers I omitted). A blank line is included to
indicate the end of the headers, and then the server sends the payload.
Network client
wget
is a program included on many systems that makes an HTTP request and
saves the payload that the server responds with. We’ll write a super basic
version that sends an HTTP request in the format given above, then takes the
response, ignores the request line and headers, and saves the payload to a
file.
int main(int argc, char *argv[]) {
assert(argc >= 3);
string url = argv[1];
string filename = argv[2];
auto parsed = parseURL(url);
string host = parsed.first;
string path = parsed.second;
int s = createClientSocket(host, 80);
sockbuf sb(s);
iosockstream ss(&sb);
sendRequest(ss, host, path);
skipHeader(ss);
savePayload(ss, filename);
}
static pair<string, string> parseURL(string url) {
if (url.substr(0, 7) == "http://") {
// Chop off the http:// from the beginning of the string
url = url.substr(7);
size_t found = url.find("/");
if (found == string::npos) {
// There is no slash in the url, so the url is just the host (e.g.
// www.google.com)
return make_pair(url, "/");
}
string hostName = url.substr(0, found);
string path = url.substr(found);
return make_pair(hostName, path);
} else {
throw "invalid proto!";
}
}
static void sendRequest(iosockstream& ss, string hostName, string path) {
ss << "GET " << path << " HTTP/1.0\r\n";
ss << "Host: " << hostName << "\r\n";
// Indicate we're done by sending an extra carriage return
ss << "\r\n";
// Make sure this all gets pushed out to the network
ss.flush();
}
static void skipHeader(iosockstream& ss) {
string line;
do {
getline(ss, line);
} while (!line.empty() && line != "\r");
// The extra \r check is to handle cases where servers send malformed
// responses with just \r
}
static void savePayload(iosockstream& ss, string filename) {
ofstream output(filename, ios::binary);
size_t totalBytes = 0;
while (!ss.fail()) {
char buffer[1024] = {'\0'}; // 1024 is arbitrary
ss.read(buffer, sizeof(buffer));
totalBytes += ss.gcount();
output.write(buffer, ss.gcount());
}
cout << "Total number of bytes fetched: " << totalBytes << endl;
}