Lecture 6: Intro to Multiprocessing

Note: Reading these lecture notes is not a substitute for watching the lecture. I frequently go off script, and you are responsible for understanding everything I talk about in lecture unless I specify otherwise.

Processes

So far, we have been talking about “unpriviliged boxes” that programs run inside of:

These “boxes” are called processes. You can think of processes like a self-contained train of thought: a process runs a sequence of assembly instructions, and has all the resources needed to support the execution of those instructions.

Note that a process is not the same thing as a program. A program might be composed of one or more processes; we’ll talk about how to use multiple processes over the next few days.

In this lecture, we’re going to look at how things are architected in order to support running multiple processes concurrently.

Running multiple processes with only one CPU

A process runs a sequence of assembly instructions, which requires using registers like rax, rbx, etc… but on a single-core CPU, there is only one set of registers! How can we run multiple things at the same time?

Well, with only one set of registers, there is no way we could run multiple processes at the same exact time, but we can fake it! We can run a process for a short period of time, then pause it and quickly switch to a different process, doing this so often that it looks like all the processes are running at the same time.

This requires making some space inside of a process struct to store register values. When a process is paused, its registers are copied into the process struct. A different process is selected and its registers are restored from the saved values in the struct back to the CPU. The process resumes executing without ever even realizing it was paused.

Sharing memory

In the beginning, when computers only did one thing at a time, processes had direct access to (almost) all of the physical RAM on a machine. This created major problems when multiple processes arrived on the scene: how were we supposed to subdivide physical memory so that each process could have its own chunk without interfering with other processes? You could do things like splitting physical memory in half with Process 1 getting the upper half and Process 2 getting the lower half, but programs often have hardcoded memory addresses inside them, which makes this design brittle (what if you want to run 2 instances of the same program – how do you make them use different halves of memory?) What if Process 1 starts running as the only process on the machine, and then later, three other processes start? How do you redistribute memory then? What if you have one process that requires a lot of memory and ten other processes that use almost no memory? How do you draw those division boundaries?

This need for flexibility led to the introduction of virtual memory. Instead of subdividing physical memory and somehow figuring out how to get each program to only use its chunk, we pretend that every process has the entire address space available for its use. Then, we observe what parts of that “virtual address space” are being used by each process, and we establish mappings from those virtual addresses to actual physical memory addresses.

It would be incredibly inefficient to map indiviual bytes (you would have to store an 8B + 8B = 16B mapping for every one byte of memory). We could establish mappings for entire segments of memory (e.g. “Process 10’s stack is located here in physical memory”), but that creates fragmentation problems if we have a lot of memory available but it doesn’t happen to be contiguous. As a middle ground, we map pages of memory, which are usually 4KB chunks.

Establishing file sessions

Recall that the open/read/write/close syscalls allow us to work with “file sessions”: we tell the OS that we’d like to start working with a file, and the OS returns a “file descriptor” as a sort of session ID. Then, we can use that file descriptor to read or write chunks to/from the file, and the OS will remember what we’re doing and where we are in reading/writing the file.

We want to be able to have multiple processes working with file sessions at the same time. There are some reasonable requirements we might look for:

How do we need to design the kernel data structures in order to make this happen? The final design uses three levels of tables:

Diagram by Jerry Cain:

Creating processes

When the fork() syscall is invoked, a new process is created as an exact replica of the original process. The original process’s virtual address space and registers are cloned as exact copies, as some sort of “process meiosis.” As such, fork gets called once but returns twice; the child process continues execution from immediately after the fork call, just as the parent does.

The above code prints one line of Greetings, but two Bye-bye lines (one from the parent, one from the child).

Greetings from process 31384! (parent 27623)
Bye-bye from parent process 31384! (parent 27623)
Bye-bye from child process 31385! (parent 31384)

fork returns a number (of type pid_t). In the parent process, this is the PID of the new child process; in the child, it returns 0. (This is the only real difference in execution between the parent and child after the fork call.) We can use this to easily distinguish between the two processes:

int main(int argc, char *argv[]) {
    printf("Greetings from process %d! (parent %d)\n", getpid(),
           getppid());
    pid_t pid = fork();
    printf("Bye-bye from %s process %d! (parent %d)\n",
           pid == 0 ? "child" : "parent", getpid(), getppid());
    return 0;
}

fork so thoroughly duplicates the process memory that it even duplicates the random seed of a process (used to generate random numbers in the random function). One might think the following process prints 3 random numbers, but it only prints two:

int main(int argc, char *argv[]) {
    printf("%ld\n", random());
    fork();
    printf("%ld\n", random());
    return 0;
}
1804289383
846930886
846930886