Lecture 10: Intro to Networking
Note: Reading these lecture notes is not a substitute for watching the lecture. I frequently go off script, and you are responsible for understanding everything I talk about in lecture unless I specify otherwise.
Announcements
- I’ve emailed each person individually to confirm your exam arrangements. Make sure you’ve verified that I have the correct information. If you are an SCPD student, be sure to respond so that I know when you are taking the exam.
- I know this is a very stressful part of the quarter, but I want to remind you to uphold your academic integrity. If this is your first class at Stanford, please review the Stanford Honor Code (it’s very short). In short, we have a lot of trust in you and we don’t like looking for dishonesty, but we expect you to uphold that trust.
- On a related note, I want you to know how proud I am of each one of you for coming this far. This is possibly the hardest class you will ever take, and I know it can feel demoralizing when it feels like everyone knows what’s going on but you’re completely lost. I promise you: everyone is in the same boat as you, and all of us course staff were there at one point as well. I have been very impressed with you, and as a class, I think you’re doing above average – many of you are noticing things and asking questions that students don’t generally ask in other quarters. Hang in there, knowing that you’ll be able to look back and be amazed at how much you’ve learned! If you’re feeling exceptionally stressed, please talk to me; we’re here to help, even if you’re stressing more about life than any particular concept from class.
- We are having extra office hours this weekend to help with STSH and the midterm. Please see the calendar for times. We don’t have lecture on this coming Monday.
- The weekly survey is up here, due
Sunday at 11:59pm as usual. When you complete this, add another line to your
NOTE_TO_GRADER file indicating you did it. (I just added a note to the
Assignment 3 handout.)
- I’ve read all your Week 3 survey responses, and there was some great feedback. I am planning on posting some responses to feedback this weekend, but haven’t had time yet. I don’t want you to feel like you are shouting into a void!
How the Internet is Structured
IP addresses and DNS
Every computer on a network has a unique IP address that identifies it on the network. When you want to connect to a server, you need to know its IP address. An IP address is stored as four bytes, separated by periods; an example IP address is “192.168.1.1”. (“500.304.259.1” is not a valid IP address, since the numbers 500, 304, and 259 are greater than 255 and thus can’t be stored in single bytes.)
Humans aren’t generally good at remembering strings of numbers like IP addresses. The Domain Name System (DNS) translates human-friendly hostnames (also called domain names) into IP addresses. You can make a DNS query to ask for the IP address of “web.stanford.edu,” and you’ll hear back that it has an IP address of “171.67.215.200.”
Recall: in filesystems, if you want to find the inode for the file
/usr/class/cs110/hello.txt
, you first ask the root directory for usr
’s
inumber, then ask /usr
for class
’s inumber, then ask /usr/class
for
cs110
’s inumber, then ask /usr/class/cs110
for hello.txt
’s inumber. A
very similar process exists in DNS. A set of root name servers are defined to
answer DNS lookups for each domain name suffix (e.g. .com
has a set of root
name servers, .org
has a set of root name servers, .edu
has a set of root
name servers, etc). When you want to look up web.stanford.edu
, we ask the
edu
root server where the nameserver for stanford.edu
is, and then we ask
the nameserver for stanford.edu
where web.stanford.edu
is. If we wanted to
support names like cs.web.stanford.edu
(which Stanford doesn’t do, but it’s
entirely possible), then we could ask the nameserver for web.stanford.edu
where cs.web.stanford.edu
is.
You can play around with DNS lookups by running dig
. You don’t have to be
familiar with the specifics of DNS and you don’t have to know what all the
different kinds of records mean, but you should have a general familiarity with
what it does and how it works.
$ dig -t NS +noall +answer edu # Where are the .edu nameservers?
edu. 91576 IN NS d.edu-servers.net.
edu. 91576 IN NS l.edu-servers.net.
edu. 91576 IN NS a.edu-servers.net.
edu. 91576 IN NS c.edu-servers.net.
edu. 91576 IN NS f.edu-servers.net.
edu. 91576 IN NS g.edu-servers.net.
$ dig -t NS +noall +answer stanford.edu # Where are the stanford.edu nameservers?
stanford.edu. 172800 IN NS argus.stanford.edu.
stanford.edu. 172800 IN NS ns7.dnsmadeeasy.com.
stanford.edu. 172800 IN NS ns6.dnsmadeeasy.com.
stanford.edu. 172800 IN NS atalante.stanford.edu.
stanford.edu. 172800 IN NS avallone.stanford.edu.
stanford.edu. 172800 IN NS ns5.dnsmadeeasy.com.
$ dig -t A +noall +answer web.stanford.edu # Where is web.stanford.edu?
web.stanford.edu. 1800 IN A 171.67.215.200
Port numbers
We want to run a server on a computer. People should be able to connect to this computer over a network and interact with this server.
When you start a server, it binds to a particular port number (valid port numbers are 0-65535). You might have several servers running on a machine for running different services (e.g. a web server, an SSH server, a file server), and each server would be bound to a unique port number. Port numbers for well-known services are generally fixed; SSH servers usually listen for connections on port 22, web servers usually listen for connections on port 80, and mail servers often listen on port 25.
You can think of IP addresses like apartment building addresses, and think of port numbers like apartment numbers within the building; when someone tries to connect to your server, they go to your apartment building, then walk to your specific apartment number in order to reach you. If someone is trying to log into SSH on a server, they will connect to that server’s IP address on port 22, knowing that the SSH server is in that particular apartment.
Network connections
A lot goes into establishing and maintaining a network connection, but most of these details are handled for us by the operating system. If you’re interested, take CS 144 (Networking). From our perspective, once a network connection is opened, we have something like a bidirectional pipe between two computers.
Log into myth, and run the following (if you get an error about “Address already in use,” just choose a number other than 12345 that is at least 1234):
echo $HOST # note this, you need it for the telnet command below
nc -l 12345
In a different terminal window, log into myth (it can be a different myth machine) and run the following:
telnet myth65.stanford.edu 12345 # replace myth65 with the host from above
Anything you type into telnet
shows up in your first window, and anything you
type into nc
shows up in your second window.
Networking as a form of function call/return
Usually, if you want to run some functionality, you call a function. In this
class, we started to explore another possibility where if you want some
function, you might invoke an executable that performs that functionality (e.g.
if you want to compress a file, but you don’t want to write the compression
code and you don’t want to bundle a C library implementing the compression, you
can invoke the tar
executable present on almost all Linux systems). In
networking, we see another form of function call/return: we can request
functionality to be executed on some remote server, and get the response from
that server. This might be helpful, for example, if we wanted to look up images
of cats (there is no way the Google Images database is going to fit on your
laptop, but you could make a function call to the Google Images servers asking
them for images of cats), or if we want to distribute a computation amongst
more cores than we could fit in a single machine (we can use many servers, and
make “function calls” to those servers, asking them to participate in the
collective computation).
Using basic network connections demonstrated using nc
and telnet
above, we
can develop various communication protocols. Someone can send a command over
the network connection to request something, and the server cand send back a
response. For example, someone could remotely call a function (sometimes called
a remote procedure call – RPC) by sending this over the network connection:
EXECUTE functionName: argument1, argument2
The server can respond over the network connection with the return value of whatever function was requested.
It turns out that this format is very similar to the HTTP protocol, which we’ll talk about on Wednesday. There are many simple HTTP APIs:
- http://icanhazip.com (tells you your own IP address)
- http://api.open-notify.org/astros.json (lists astronauts currently in space)
- https://www.placecage.com/200/400 (generates placeholder images of a desired size featuring Nick Cage)
- https://placekitten.com/ (same as above, but with kittens)
There are many more complicated APIs letting you do more useful things:
- https://apilist.fun/
- https://www.reddit.com/r/webdev/comments/3wrswc/what_are_some_fun_apis_to_play_with/
For today, we’ll use very basic request/response schemes.
Implementing a server
Hello world, for the twentieth time
int main(int argc, char *argv[]) {
int serverSocket = createServerSocket(12345);
if (serverSocket < 0) {
cout << "Error: Could not start server" << endl;
return 1;
}
while (true) {
int clientSocket = accept(serverSocket, NULL, NULL);
cout << "Sending response to new client" << endl;
write(clientSocket, "Hello world!\n", 13);
close(clientSocket);
}
return 0;
}
createServerSocket
is a function we have written for you, which we will
implement together in class on Wednesday. It returns a file descriptor that is
connected to a virtual file storing a list of people trying to connect to our
server on port 12345
. (It is technically more complicated than that, but you
can think of it this way for now.) The accept
syscall waits until someone
tries to connect to our server, and then once someone tries to connect, it
takes their info off the list of incoming connections that serverSocket
stores, and returns the clientSocket
file descriptor.
The clientSocket
file descriptor is a network socket connected to the
person contacting our server. Sockets are bidirectional, and this file
descriptor is configured to be readable and writable, which is unlike what
you have seen with pipes. If you read from this file descriptor, you’ll read
bytes that the client has sent you, and if you write to this file descriptor,
the bytes you write will be sent to the client.
Since clientSocket
is just a file descriptor, we can write
a message to it.
If you run this server and run nc myth55.stanford.edu 12345
(or telnet
myth55.stanford.edu 12345
, although telnet prints some extra stuff that we
don’t want for this minimal example), you’ll see “Hello world!” printed.
We could extend this program to also print out anything the client sends us by
adding a while
loop between the write
and close
calls to echo out
anything the client sends us:
int main(int argc, char *argv[]) {
int serverSocket = createServerSocket(12345);
if (serverSocket < 0) {
cout << "Error: Could not start server" << endl;
return 1;
}
while (true) {
int clientSocket = accept(serverSocket, NULL, NULL);
cout << "Sending response to new client" << endl;
write(clientSocket, "Hello world!\n", 13);
while (true) {
char buf[1024];
ssize_t bytesRead = read(clientSocket, buf, 1024);
if (read <= 0) break;
write(STDOUT_FILENO, buf, bytesRead);
}
cout << "Client hung up" << endl;
close(clientSocket);
}
return 0;
}
Implementing a basic time server
This server tells you what time it is:
static void publishTime(int clientSocket) {
time_t rawtime;
time(&rawtime);
struct tm *ptm = gmtime(&rawtime);
char timeString[128]; // more than big enough
strftime(timeString, sizeof(timeString), "%c\n", ptm);
write(clientSocket, timeString, strlen(timeString));
close(clientSocket);
}
int main(int argc, char *argv[]) {
int serverSocket = createServerSocket(12345);
if (serverSocket < 0) {
cout << "Error: Could not start server" << endl;
return 1;
}
while (true) {
int clientSocket = accept(serverSocket, NULL, NULL);
cout << "Sending response to new client" << endl;
publishTime(clientSocket);
}
return 0;
}
However, it has a potential problem. (The hello world example above also has
this problem.) We ask write
to send sizeof(timeString)
bytes, but if the
network is congested, it’s possible that write
may only send a few of the
bytes, and ask you to try transmitting the rest later. We need to use a while
loop to handle this case:
static void publishTime(int clientSocket) {
time_t rawtime;
time(&rawtime);
struct tm *ptm = gmtime(&rawtime);
char timeString[128]; // more than big enough
strftime(timeString, sizeof(timeString), "%c\n", ptm);
size_t numBytesWritten = 0, numBytesToWrite = strlen(timeString);
while (numBytesWritten < numBytesToWrite) {
numBytesWritten += write(clientSocket,
timeString + numBytesWritten,
numBytesToWrite - numBytesWritten);
}
close(clientSocket);
}
This works well enough.
Introducing sockbuf and iosockstream
It would be nice to use the C++ stream abstractions instead of needing to work
with raw file descriptors and read
and write
calls. The sockbuf
and
iosockstream
classes let us do this.
static void publishTime(int clientSocket) {
time_t rawtime;
time(&rawtime);
struct tm *ptm = gmtime(&rawtime);
char timeString[128]; // more than big enough
strftime(timeString, sizeof(timeString), "%c", ptm);
sockbuf sb(clientSocket); // destructor closes socket
iosockstream ss(&sb);
ss << timeString << endl;
}
Note that there is no close
call. The sockbuf
destructor takes care of
that.
Adding threads
This server can only respond to one request at a time, which leads to pretty poor performance. We can improve the situation by using multithreading:
int main(int argc, char *argv[]) {
int serverSocket = createServerSocket(12345);
if (serverSocket < 0) {
cout << "Error: Could not start server" << endl;
return 1;
}
vector<thread> threads;
while (true) {
int clientSocket = accept(serverSocket, NULL, NULL);
cout << "Sending response to new client" << endl;
threads.push_back(thread(publishTime, clientSocket));
}
return 0;
}
This introduces a race condition, because as it turns out, the gmtime
function isn’t safe against concurrent calls. We can use a “reentrant” version
of gmtime
, called gmtime_r
, which doesn’t use any global variables and is
thus safe from race conditions:
static void publishTime(int clientSocket) {
time_t rawtime;
time(&rawtime);
struct tm tm;
gmtime_r(&rawtime, &tm);
char timeString[128]; // more than big enough
strftime(timeString, sizeof(timeString), "%c", &tm);
sockbuf sb(clientSocket); // destructor closes socket
iosockstream ss(&sb);
ss << timeString << endl;
}
This is solid, except for the fact that it might overwhelm the scheduler with threads if we get a lot of incoming connections at the same time. A semaphore is our friend.
int main(int argc, char *argv[]) {
int serverSocket = createServerSocket(12345);
if (serverSocket < 0) {
cout << "Error: Could not start server" << endl;
return 1;
}
vector<thread> threads;
semaphore threadPermits(16);
while (true) {
int clientSocket = accept(serverSocket, NULL, NULL);
threadPermits.wait();
cout << "Sending response to new client" << endl;
threads.push_back(thread([clientSocket, &threadPermits](){
threadPermits.signal(on_thread_exit);
publishTime(clientSocket);
}));
}
return 0;
}