Lecture 17
Note: Reading these lecture notes is not a substitute for watching the lecture. I frequently go off script, and you are responsible for understanding everything I talk about in lecture unless I specify otherwise.
Systems classes following CS 110
- CS 140: Operating systems
- You’ve been using processes, threads, virtual memory, and more in this class. CS 140 goes a level deeper and implements these things within the operating system.
- CS 143: Compilers
- There are 4 assignments, in which you implement the 4 stages of a compiler. By the end of the class, you have a working compiler that can translate programs written in COOL (a language somewhat similar to C++ and Java) into assembly instructions.
- CS 144: Computer networking
- This class is about understanding (and sometimes implementing) the various network layers that allow us to establish functional connections between computers.
- CS 155: Computer security
- This is one of the coolest classes I’ve ever taken. It’s a very practical class that demonstrates common vulnerabilities in systems. Take it even if you don’t think you’re going to be a systems person :)
Principles of Computer Systems
You may already feel familiar with a lot of these concepts, but I want to put them into concrete terms and talk about how they have been relevant to our discussions throughout the past quarter. Many times we take these ideas for granted, but it’s worth thinking about them for a class.
Abstraction
Abstraction is about defining interfaces and focusing on the ideas behind a function instead of on the implementation details. We can define interfaces and use them without needing to know how everything works under the hood, and we can support multiple implementations that follow the same interface.
For example: Do you actually know how writing to stdout
causes characters to
appear on your terminal window? Probably not, but we have defined an interface
where you can write to file descriptor 1 and those bytes will be printed to
your terminal. You can use this abstraction without even understanding how it
works; furthermore, different operating systems implement things differently,
but we can use the same interface regardless of what operating system we’re
using.
Other abstractions we’ve used in this class:
- Filesystems: In previous classes, you’ve probably worked with C
FILE *
s or C++fstream
s without knowing how they worked - Processes: You know how to do multiprocessing, even though you don’t really know what’s happening at an assembly instruction level in order to support that
- Signals: You understand how to send and receive signals, but you probably don’t know what the operating system is doing on your behalf in order to make it happen
- Threads: You know how to create threads, but you don’t really know how they’re implemented
- Network sockets: You know how to use network connections as pipes that connect two computers, but you don’t know what’s happening under the hood for the OS to provide this illusion
Modularity and Layering
Modularity: as soon as code starts getting complicated, let’s start breaking it down into manageable pieces.
Layering is a special form of modularity in which we stack pieces on top of each other.
You’ve seen layering since your CS 106 days. For example, a stack is a data structure layered on top of a vector, and a vector is a data structure layered on top of an array.
Some layering we have seen in this class:
- Filesystems involve many layers, as you saw in Assignment 1:
- Block
- Inode
- File
- Directory
- Pathname
- Symbolic links
- In the past 2 weeks, we have layered
sockbuf
s on top of raw sockets andsockstream
s on top ofsockbuf
s - MapReduce allows us to build a distributed processing infrastructure, then layer simple mappers/reducers on top
Naming and name resolution
We need names to refer to system resources. (How else would you address a process? How else would you address an open file?) We also need name resolution systems to convert from human-friendly names to machine-friendly ones.
Caching
A cache is a component – sometimes implemented in hardware, sometimes in software – that stores data so that future requests can be handled more quickly.
We see caching all over in the storage hierarchy:
- Network-based storage is really slow
- We can use disk space to cache that if we want
- We can use RAM to cache data from disk
- The L3, L2, and L1 processor caches cache data from RAM
- Finally, information is stored in registers.
There are also TLB caches, DNS caches, and web caches (like what you did in
proxy
).
Virtualization
Virtualization is about making many hardware resources look like one, or making one hardware resource look like many.
Making many hardware resources look like one:
- RAID allows you to connect many disks to a machine that appear as one disk
- AFS does a similar thing with networked filesystems
- A web load balancer distributes load to many servers
Making one hardware resource look like many:
- Virtual memory makes every process think it owns all of memory
- Threads/processes provide the illusion that everything is running in parallel, even if there is only one CPU
Concurrency
This is about multiple threads or processes running at the same time. We’ve seen concurrency even across clusters of machines in MapReduce. Even signal and interrupt handlers are a form of concurrency. Some programming languages (e.g. Erlang) are designed so entirely around concurrency that they make race conditions impossible.
Client-server request-response
Request/response is a good way to organize functionality into modules that have a clear set of responsibilities. You were already familiar with this with functions and libraries of functions. Now, in this class, we’ve seen this pattern extended to system calls, to multiprocessing, and to network requests.