Lab 2: Working with Processes

This handout was adapted from Jerry Cain’s Spring 2018 offering.

I’ve created a Slack channel for Lab 2 discussion (aptly named #lab2), and all students (but particularly remote students) are encouraged to share their ideas there.

Getting started

Before starting, go ahead and clone the lab2 folder, which is necessary for problems 3 and 4:

$ git clone /usr/class/cs110/repos/lab2/shared lab2
$ cd lab2
$ make

Problem 1: Short Answer Questions

Here are a bunch of short answer questions that have appeared on past CS110 midterms and final exams.

Virtual memory

Assume the OS allocates virtual memory to physical memory in 4096-byte pages.

For fun, optional reading, read these two documents (though you needn’t do this reading, since it goes beyond the scope of my lecture discussion of virtual memory):

Process scheduling

Recall from last Friday’s lecture that the process scheduler is a component of the operating system that decides whether a running process should continue running and, if not, what process should run next. This scheduler maintains three different data structures to help manage the selection process: the set of running processes, the ready queue, and the blocked set.

Problem 2: Incorrect Output File Redirection

Note: I didn’t manage to get to dup2 in Monday’s lecture. If you aren’t doing this lab in person (i.e. don’t have a CA to teach you what dup2 does), you may want to come back to this problem after Friday’s lecture. It’s a very good problem and is worth working through.

The publish user program takes an arbitrary number of filenames as arguments and attempts to publish the date and time (via the date executable that ships with all versions of Unix and Linux). publish is built from the following source:

static void publish(const char *name) {
   printf("Publishing date and time to file named \"%s\".\n", name);
   int outfile = open(name, O_WRONLY | O_CREAT | O_TRUNC, 0644);
   dup2(outfile, STDOUT_FILENO);
   close(outfile);
   if (fork() > 0) return;
   char *argv[] = { "date", NULL };
   execvp(argv[0], argv);
}
 
int main(int argc, char *argv[]) {
   for (size_t i = 1; i < argc; i++) publish(argv[i]);
   return 0;
}

Someone with a fractured understanding of processes, descriptors, and file redirection might expect the program to have printed something like this:

$ ./publish one two three four
Publishing date and time to file named "one".
Publishing date and time to file named "two".
Publishing date and time to file named "three".
Publishing date and time to file named "four".

However, that’s not what happens. Questions:

Problem 3: valgrind and orphaned file descriptors

This problem may be useful in your assignments to show you how to ensure you aren’t leaking file descriptors.

Here’s a very short exercise to enhance your understanding of valgrind and what it can do for you. Start by opening nonsense.cc in the lab2 directory, and trace through the code to keep tabs on what file descriptors are created, properly closed, and orphaned. Then run valgrind ./nonsense to confirm that there aren’t any memory leaks or errors (how could there be?), but then run valgrind --track-fds=yes ./nonsense to get information about the file descriptors that were (intentionally) left open. Without changing the logic of the program, insert as many close statements as necessary so that all file descriptors (including 0, 1, and 2) are properly donated back. (In practice, you do not have to close file descriptors 0, 1, and 2.)

Problem 4: Experimenting with strace

This is an interesting problem, but it is the least important of the four. If you are short on time, skip this problem.

strace is an advanced development tool that programmers use to determine what system calls contribute to the execution of a program (generally because the program is malfunctioning, and they’re curious if any failing system calls are the root cause of the problem). If, for instance, you want to see how sleep 10 works behind the scenes, you gain a lot by typing this in and following along:

$ strace sleep 10
execve("/bin/sleep", ["sleep", "10"], [/* 28 vars */]) = 0
brk(NULL)                               = 0x1e35000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=143245, ...}) = 0
mmap(NULL, 143245, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f638fc15000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3

// lots of calls omitted in the name of brevity

fstat(3, {st_mode=S_IFREG|0644, st_size=4289456, ...}) = 0
mmap(NULL, 4289456, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f638f231000
close(3)                                = 0
nanosleep({10, 0}, NULL)                = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

A typical strace run starts with an execve (which my shell uses instead of execvp), and then works through all of these systemsy things to load C++ libraries, build the heap segment, etc., until it reaches the crucial nanosleep call, which is the call that halts the process for 10 seconds. You see gestures to system calls that have come up in CS107 and CS110: execve, access, mmap, open, and close.

If you’re interested only in a particular subset of system calls, you can identify those of interest when invoking strace using the -e trace=<comma-separated-list-of-syscall-names>, as with this:

$ strace -e trace=read ls /usr/class/cs110
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260Z\0\0\0\0\0\0"..., 832) = 832
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\t\2\0\0\0\0\0"..., 832) = 832
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000\25\0\0\0\0\0\0"..., 832) = 832
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\r\0\0\0\0\0\0"..., 832) = 832
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260`\0\0\0\0\0\0"..., 832) = 832
read(3, "nodev\tsysfs\nnodev\trootfs\nnodev\tr"..., 1024) = 434
read(3, "", 1024)                       = 0
cgi-bin  include  lecture-examples  lib  local    private_data  repos  samples  staff  tools  WWW
+++ exited with 0 +++
$ strace -e trace=mmap,munmap ls /usr/class/cs110mmap(NULL, 143245, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7faf170d5000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faf170d4000
mmap(NULL, 2234080, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7faf16cb1000
mmap(0x7faf16ecf000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e000) = 0x7faf16ecf000
mmap(0x7faf16ed1000, 5856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7faf16ed1000
mmap(NULL, 3971488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7faf168e7000
mmap(0x7faf16ca7000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c0000) = 0x7faf16ca7000

// lots of calls omitted in the name of brevity

mmap(0x7faf1646f000, 13352, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7faf1646f000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faf170d2000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faf170d0000
munmap(0x7faf170d5000, 143245)          = 0
mmap(NULL, 4289456, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7faf15e3e000
cgi-bin  include  lecture-examples  lib  local    private_data  repos  samples  staff  tools  WWW
+++ exited with 0 +++

Take some time to experiment with strace. Type in each of these commands from any directory and see what you get:

Note that calls to fork are reframed as calls to clone, and calls to execvp are reframed as calls to execve.

Rather than list out specific system calls via -e trace=<csv-list-of-syscall-name>, strace allows you to specify a family of system calls, as per strace’s man page, e.g. -e trace=file, or -e trace=memory.

Now type these commands from any directory and see what you get:

See what happens when you try to launch something that can’t be launched, because the specified file isn’t an executable:

Just for fun, try strace strace, with:

Finally, descend into your lab2 folder and trace a few things there:

Scan strace’s man page for information about the -f flag, and be sure you understand why the output of the second command includes so many calls to execve.