Lecture 7: Synchronization, job control
Note: Reading these lecture notes is not a substitute for watching the lecture. I frequently go off script, and you are responsible for understanding everything I talk about in lecture unless I specify otherwise.
The fork()
syscall
The fork()
syscall creates a new copy of the current process. We refer to the
current process as the parent process, and the new copy as the child
process.
You can sort of think of fork()
as a fork in the road of assembly
instructions, where the parent process continues down one path and the child
process continues down another…
Return value of fork
: distinguishing between the child and the parent
Keep in mind that fork
almost perfectly copies the calling process: all
variables are copied, all file descriptors are copied, and fork()
returns to
the exact same place in the program in both processes. How is a program
supposed to know if it is the original/parent process or the clone/child? If we
want one of the processes to do one thing and the other process to do something
else, how do we make that happen?
This is accomplished via the return value of fork
:
fork()
returns 0 to the child process to indicate that it is the child. If the child wants to figure out its own PID, it can callgetpid()
, and if the child wants to figure out the parent process’s PID, it can callgetppid()
.fork()
returns the child process’s PID (a positive number) to the parent. This is the only way the parent can find out the child’s PID; there is no “get child process PID” system call, since the parent canfork()
any number of children, and that would make such a syscall complicated. The parent can, of course, get its own PID by callinggetpid()
.fork()
returns -1 on error, just like any other syscall. This usually means that there exist too many processes, and the kernel is refusing to create more.
Dragons ahead ๐โ ๏ธ
Beware of programs that make runaway fork()
calls. For example, this is one
of the worst programs to run:
int main() {
while (true) {
fork();
}
}
This program is known as a forkbomb. It creates a child process; then, both
the parent and the child create child processes (resulting in 4 processes
total); then, all of those processes create child processes (resulting in 8
processes total), so on, until your machine runs out of processes, every
process is stuck in an infinite while
loop, and your computer grinds to a
halt.
This might look like an obviously bad idea, but it’s not hard to accidentally
code slightly less-bad versions of this. You should be careful whenever calling
fork()
to look at what the child process might do, and ensure it terminates
properly without accidentally creating more child processes. For example,
consider this well-intentioned but buggy code:
1int main() {
2 load some huge string to be processed
3 while (stringHasMoreLines()) {
4 processNextLine();
5 }
6}
7
8void processNextLine() {
9 if (fork() == 0) {
10 Do some parallel processing in the child
11 }
12 Do some parallel processing in the parent
13}
This code has an accidental sort of forkbomb: after each child process is done
doing the parallel processing on line 10, it will continue out of the if
statement, do the parallel processing that only the parent will do, and then,
even worse, it will return from processNextLine
back to the while
loop on
line 3 and fork a grandchild process. That grandchild process will do the
same, on and on, until every process has gone through every line in the string
buffer. This can create big problems.
To avoid this, it’s important to place an exit(0)
call after line 10 so that
the child does not continue past that point.
Viewing processes
When multiprocess code does not do what you expect, it is often helpful to try
to get an idea of what the code is doing instead. To view processes running
on your computer, you can run ps aux
:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 170116 9560 ? Ss May22 9:49 /sbin/init
root 2 0.0 0.0 0 0 ? S May22 0:02 [kthreadd]
root 3 0.0 0.0 0 0 ? I< May22 0:00 [rcu_gp]
root 4 0.0 0.0 0 0 ? I< May22 0:00 [rcu_par_gp]
root 6 0.0 0.0 0 0 ? I< May22 0:00 [kworker/0:0H-kblockd]
root 9 0.0 0.0 0 0 ? I< May22 0:00 [mm_percpu_wq]
root 10 0.0 0.0 0 0 ? S May22 0:17 [ksoftirqd/0]
root 11 0.0 0.0 0 0 ? I May22 46:59 [rcu_sched]
<many more entries>
If you’re working on a shared computer like myth
, it may be helpful to use
grep
to filter the output, showing only your processes, and maybe also only
filtering for the program you want to inspect:
ps aux | grep yourUsername | grep program
For example, I can start sleep
in one terminal, then open another terminal
SSHed to the same myth machine (e.g. ssh mythXX.stanford.edu
, where XX is
the same number as the first terminal), and list the process info:
๐ ps aux | grep rebs | grep sleep
rebs 1128862 0.0 0.0 8076 596 pts/6 S+ 22:04 0:00 sleep 100
rebs 1128868 0.0 0.0 8900 676 pts/1 S+ 22:04 0:00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn sleep
(You’ll need to ignore the second line; that’s grep
itself that turned up in
the search results.)
ps
has a bunch of custom fields you can display if you want more info. For example:
๐ ps o pid,ppid,pgid,stat,user,command -p $(pgrep -u $USER sleep)
PID PPID PGID STAT USER COMMAND
1128862 1128820 1128862 S+ rebs sleep
One thing that can sometimes come in handy is to use pstree
to show the tree
of parent/child processes:
๐ pstree -psa $(pgrep -u $USER sleep)
systemd,1
โโsshd,846
โโsshd,1126881
โโsshd,1126930
โโzsh,1128820
โโsleep,1129081 100
Scheduling
Consider the following program, which prints a letter, forks, and continues the loop:
You might reasonably guess that the program outputs the following:
a
b
b
c
c
c
c
d
d
d
d
d
d
d
d
However, the order is not necessarily preserved. I get a different ordering of the letters every time, but this is one example output:
a
b
c
b
d
c
d
c
d
d
c
d
d
d
d
This is the effect of the process scheduler at work. The above code creates 8 processes, but we may only have 2 CPU cores with which to run those processes. In order to provide the illusion of running many processes simultaneously, the operating system scheduler does the following:
- A process is allowed to run on a particular CPU core. After some short interval – a handful of microseconds – the OS pauses the process, copies the contents of registers into a process struct, and adds that data structure to a queue of running processes, called the ready queue.
- The OS then selects another process from the ready queue, loads the saved registers back into the CPU, and resumes execution of that process as if it had never stopped.
- Occasionally, a process needs to wait for something (e.g. it calls
sleep
, or it waits for a network request to come in, or it waits for a different process to do something). In this case, the process is removed from the ready queue and is instead moved to the blocked set. From there, the process isn’t considered for scheduling. (Eventually, it is moved from the blocked set back to the ready queue when the thing it was waiting for is ready.) We will talk more about these situations in the next few lectures.
Note that the ready queue isn’t a simple ordered queue; we may have high-priority processes that should get more CPU time. The scheduler employs a sophisticated algorithm to balance the needs of various processes, and, as a result, processes may not run in the order you expect them to. You are never given any guarantees about process scheduling, other than the fact that your process will be scheduled and will be executed eventually.
Viewing process state
ps
also displays the state of a process in the STAT
(E) column:
R
means the process is either on the CPU or is in the ready queue, ready to run.D
orS
mean the process is in the blocked set, waiting for something to happen.
In the ps
example output from earlier, you might notice that sleep
was in
the S
state, since it was waiting for a timer to elapse.
Basics of synchronization: the waitpid
syscall
The waitpid
system call can be used to wait until a particular child process
is finished executing. (It’s actually a more versatile syscall than that, and
we will discuss that in a moment, but let’s keep it simple for now.)
In the following code, a process forks, and then the parent process waits for the child to exit:
Note: waitpid can only be called on direct child processes (not parent processes, or grandchild processes, or anything else).
Synchronization puzzle
What are the possible outputs of this program?
Getting the return code from a process
The number returned from main
is the return code or exit status code of a
process. We can pass a second argument to waitpid
to get information about
the child process’s execution, including its return code:
int main(int argc, char *argv[]) {
pid_t pid = fork();
if (pid == 0) {
// Child process
sleep(1);
printf("CHILD: Child process exiting...\n");
return 0;
}
// Parent process
printf("PARENT: Waiting for child process...\n");
int status;
waitpid(pid, &status, 0);
if (WIFEXITED(status)) {
printf("PARENT: Child process exited with return code %d!\n",
WEXITSTATUS(status));
} else {
printf("PARENT: Child process terminated abnormally!\n");
}
return 0;
}
We can now modify the return 0;
of the child code to return some other
number, or even to segfault (in which case, WIFEXITED(status)
will return
false).
Calling waitpid without a specific child PID
You can call waitpid passing -1
instead of a child’s PID, and it will wait
for any child process to finish (and subsequently return the PID of that
process). If there are no child processes remaining, waitpid
returns -1 and
sets the global variable errno
to ECHILD
(to be specific about the “error
condition.” It can return -1 for other reasons, such as passing an invalid 3rd
argument.)
This example creates several processes without keeping track of their PIDs,
then calls waitpid
until the parent has no more child processes that it
hasn’t already called waitpid
on:
This calls waitpid
a total of 9 times (it returns child PIDs 8 times, then
returns -1 to indicate that there are no remaining children).
When -1
is passed as the first argument, waitpid returns children in a
somewhat arbitrary order. If several child processes have exited by the time
you call waitpid, it will choose an arbitrary child from that set. Otherwise,
if you call waitpid before any child processes have stopped, it will wait for
at least one of the running children to exit.
waitpid and scheduling
To be clear, waitpid does not influence the scheduling of processes. Calling waitpid on a process does not tell the OS, “hey, I am waiting on this process, so please give it higher priority.” It simply blocks the parent process until the specified child process has finished executing.
waitpid is not optional!
When a process exits, the kernel does not immediately free all of the memory
that was being used for that process; although many resources can be freed
(e.g. the file descriptor table or the virtual memory page table), the process
struct is still kept around so that the parent process can eventually get exit
information via waitpid
. When a process has exited but has not yet been
waited on by the parent, it is called a zombie process: the process is dead,
but it still exists, and still counts towards the maximum number of processes
that can be run. It’s very important for the parent to call waitpid
to “reap”
the zombies.
In this way, fork()
is kind of like malloc
(it is allocating a process) and
waitpid()
is kind of like free
(it is freeing a process). Every fork()
call should be paired with a waitpid()
.
Lifecycle of a process
Job control
Processes start, cycle on/off the CPU, and eventually terminate, but they can also be paused at arbitrary points by job control signals. This is useful in a variety of circumstances: for example, maybe you’re running a long, CPU-intense program, and you want to pause it so you can quickly run some quick CPU-intense program. Or, as another example, Mac OS will send “pause” signals to programs when it starts running out of (physical) memory, prompting you to close some apps before resuming them. Job control is sometimes even used programmatically to synchronize between processes; for example, Process A will pause itself to wait for Process B to catch up, and then Process B will signal Process A to continue when it’s ready.
We’ll talk more about signals next week, so don’t worry much about the details
of how this works, but in summary, you can send SIGSTOP
to pause a process
and SIGCONT
to continue the process.
On the command line:
# Pause PID 1234
kill -STOP 1234
# Resume PID 1234
kill -CONT 1234
Or, programmatically:
# Pause PID 1234
kill(1234, SIGSTOP);
# Resume PID 1234
kill(1234, SIGCONT);
Job control and waitpid
waitpid
can also be used to observe when a program changes job control
states (e.g. stops or continues due to SIGSTOP
or SIGCONT
). This is
accomplished through the third flags
parameter:
- Specifying
WUNTRACED
will causewaitpid
to return information about processes that have terminated or stopped. (It’s not a great name in my opinion, but has some historical legacy behind it.) - Specifying
WCONTINUED
will causewaitpid
to return information about processes that have terminated or continued. - Specifying
WUNTRACED | WCONTINUED
will cause waitpid to return information about any state change: it will return when a process stops, continues, or terminates/exits.