Lab 2 Solutions
This handout was adapted from Jerry Cain’s Spring 2018 offering.
I’ve created a Slack channel for Lab 2 discussion (aptly named #lab2), and all students (but particularly remote students) are encouraged to share their ideas there.
Getting started
Before starting, go ahead and clone the lab2
folder, which is necessary for
problems 3 and 4:
$ git clone /usr/class/cs110/repos/lab2/shared lab2
$ cd lab2
$ make
Problem 1: Short Answer Questions
Here are a bunch of short answer questions that have appeared on past CS110 midterms and final exams.
- Recall that the stack frames for system calls are laid out in a different
segment of memory than the stack frames for user functions. How are the
parameters passed to the system calls received when invoked from user
functions? And how is the process informed that all system call values have
been placed and that it’s time to execute?
- The system call opcode and all parameters are passed through registers
(
%rax
for the opcode, others for the 0 to 6 parameters). A system call is invoked by issuing a software interrupt aka trap (via an assembly code instructionint 0x80
). The0x80
identifies which internal OS routine should handle the trap, and that handler builds an activation record in the kernel stack using the values passed through registers, executes the system call, places a return value in%rax
, and then returns from the handler to the instruction after the user’sint 0x80
. Note: You don’t need to understand syscall mechanics in detail, but you should have a general understanding of how they work and how isolation is implemented.
- The system call opcode and all parameters are passed through registers
(
- Recall that one vnode table and one file entry table are maintained on behalf
of all processes, but that each process maintains its own file descriptor
table. What problems would result if just one file descriptor table were
maintained on behalf of all processes?
- If all processes referred to the same descriptor table, one
process – perhaps one owned by another user – could close or otherwise
manipulate the file session behind the back of the true owner. That
would certainly lead to unintended consequences, and it would also lead
to security concerns (e.g. a child process might inherit access to a
session leading to a file with private information). Additionally,
consider
subprocess
: we need to rewire the child process’s stdin to read from the pipe, but we want to leave the parent process’s stdin untouched. This wouldn’t be possible with such a scheme.
- If all processes referred to the same descriptor table, one
process – perhaps one owned by another user – could close or otherwise
manipulate the file session behind the back of the true owner. That
would certainly lead to unintended consequences, and it would also lead
to security concerns (e.g. a child process might inherit access to a
session leading to a file with private information). Additionally,
consider
- Your terminal can be configured so that a process dumps core – that is,
generates a data file named
core
– whenever it crashes (because it segfaults, for instance.) Thiscore
file can be loaded into and analyzed withingdb
to help identify where and why the program is crashing. Assuming we can modify the program source code and recompile, how might you programmatically generate a core dump at specific point in the program while allowing the process to continue executing? (Your answer might include a very, very short code snippet to make its point.)- Call
fork
at the line of interest, and have the child intentionally segfault (e.g.*(int *) NULL = 14
; orraise(SIGSEGV);
on the line immediately after thefork
.
- Call
- The
fork
system call creates a new process with an independent address space, where the contents of the parent’s address space are replicated – in a sense,memcpy
‘ed – into the address space of the clone. If, however, a copy-on-write implementation strategy is adopted, then both parent and child share the same address space and only start to piecemeal split into two independent address spaces as one process makes changes that shouldn’t be reflected in the other. In general, most operating systems adopt a copy-on-write approach, even though it’s more difficult to implement. Defend that approach given what we’ve learned in class so far about why processes typically create other processes usingfork
.- The vast majority of
fork
calls lead the child process to anexecvp
call, where the child’s address space is gutted 100% and replaced with a brand new one.fork
’s defense for thecopy-on-write
model would be that it’s a lot of work to replicate an entire address space only for it to be discarded a few lines later whenexecvp
is called.
- The vast majority of
Virtual memory
Assume the OS allocates virtual memory to physical memory in 4096-byte pages.
- If the virtual address
0x7fffa2efc345
maps to the physical page in main memory whose base address is0x12345aab8000
, what range of virtual addresses around it would map to the same physical page?- Since we’re assuming 4096-byte pages, we’ve expect
0x7fffa2efc000
through0x7fffa2efcfff
to all map to the physical page with base address0x12345aab8000
(which houses bytes addressed0x12345aab8000
through0x12345aab8fff
).
- Since we’re assuming 4096-byte pages, we’ve expect
- What’s the largest size a character array can be before it absolutely must
map to three different physical pages?
- 2 * 4096 = 8192 bytes. This is the size of two pages; if the array is any larger, it can’t fit in two pages.
- What’s the smallest size a character array can be and still map to three
physical pages?
- 4096 + 1 + 1 = 4098 bytes. All bytes of the array must be contiguous, but in theory, it’s possible (even if unlikely) that the 0th byte of the array is the last byte of one physical page, bytes 1 through 4096 fill a second physical page, and byte 4097 must roll over to the 0th byte of a third physical page.
For fun, optional reading, read these two documents (though you needn’t do this reading, since it goes beyond the scope of my lecture discussion of virtual memory):
- http://www.cs.cmu.edu/afs/cs/academic/class/15213-f15/www/lectures/17-vm-concepts.pdf. These are lecture slides that Bryant, O’Hallaron, and their colleagues rely on while teaching the CMU equivalent of CS110 (they’re on a 15-week semester, so they go into more depth than we do).
- http://www.informit.com/articles/article.aspx?p=29961&seqNum=2: This is an article written some 15 years ago by two senior research scientists at HP Labs who were charged with the task of porting Linux to IA-64.
Process scheduling
Recall from last Friday’s lecture that the process scheduler is a component of the operating system that decides whether a running process should continue running and, if not, what process should run next. This scheduler maintains three different data structures to help manage the selection process: the set of running processes, the ready queue, and the blocked set.
- Describe the running process set, ready queue, and the blocked set. What
does each node in each queue consist of?
- The running set keeps track of all of the processes that are currently assigned to the CPU. The nodes in that queue needn’t store very much if anything at all, since the CPUs themselves house everything needed for execution. The running queue could be of length 0 (meaning all processes are blocked), or its length can be as high as the number of CPUs.
- The ready queue keeps track of all of the processes that aren’t currently running but are qualified to run. The nodes in the queue store the state of a CPU at the moment it was pulled off the processor. That information is used to restore a CPU when a process is promoted from ready to running again, so that the process can continue as if it was never interrupted.
- The blocked set looks much like the ready queue, except that it contains
processes that were forced off the processor even before its time slice
ended. A process is demoted to the blocked set because it blocked on
something (e.g.
waitpid
).
- Give an example of a system call (with arguments) that may or may not move a
running process to the blocked set.
waitpid(-1, NULL, 0)
might block indefinitely if there are child processes but none of them have changed state since the last timewaitpid
was called. It might return immediately if a child finished prior to thewaitpid
call, or if there aren’t any child processes at the time of thewaitpid
call.
- Give an example of a system call (with arguments) that is 100% guaranteed to
move a process to the blocked set.
sleep(100)
. The process is moved to the blocked set and a timer interrupt is scheduled to lift the process from the blocked set to the ready queue after 100 seconds.
- What needs to happen for a process to be hoisted from the blocked set to the
ready queue?
- Whatever the process was waiting for needs to complete. This depends on
the case. If a process called
sleep
, then the timer that was set needs to fire. If a process calledwaitpid
, the OS makes a note in the child process’s bookkeeping struct that the parent is waiting for it; when the child exits, the OS will move the parent to the ready queue.
- Whatever the process was waiting for needs to complete. This depends on
the case. If a process called
Problem 2: Incorrect Output File Redirection
Note: I didn’t manage to get to dup2
in Monday’s lecture. If you aren’t
doing this lab in person (i.e. don’t have a CA to teach you what dup2
does),
you may want to come back to this problem after Friday’s lecture. It’s a very
good problem and is worth working through.
The publish
user program takes an arbitrary number of filenames as arguments
and attempts to publish the date and time (via the date
executable that ships
with all versions of Unix and Linux). publish
is built from the following
source:
static void publish(const char *name) {
printf("Publishing date and time to file named \"%s\".\n", name);
int outfile = open(name, O_WRONLY | O_CREAT | O_TRUNC, 0644);
dup2(outfile, STDOUT_FILENO);
close(outfile);
if (fork() > 0) return;
char *argv[] = { "date", NULL };
execvp(argv[0], argv);
}
int main(int argc, char *argv[]) {
for (size_t i = 1; i < argc; i++) publish(argv[i]);
return 0;
}
Someone with a fractured understanding of processes, descriptors, and file redirection might expect the program to have printed something like this:
$ ./publish one two three four
Publishing date and time to file named "one".
Publishing date and time to file named "two".
Publishing date and time to file named "three".
Publishing date and time to file named "four".
However, that’s not what happens. Questions:
What text is actually printed to standard output?
The following is printed:
Publishing date and time to file named "one".
What do each of the four files contain?
Contents of
one
(lines can be interchanged):Publishing date and time to file named "two". Fri Jul 6 21:16:17 PDT 2018
The contents of
two
andthree
are very similar.Contents of
four
:Fri Jul 6 21:16:18 PDT 2018
How should the program be rewritten so that it works as intended?
Because the child processes (and only the child processes) should be redirecting, you should
open
,dup2
, andclose
in child-specific code. A happy side effect of the change is that you never muck withSTDOUT_FILENO
in the parent if you confine the redirection code to the child.static void publish(const char *name) { printf("Publishing date and time to file named \"%s\".\n", name); if (fork() > 0) return; int outfile = open(name, O_WRONLY | O_CREAT | O_TRUNC, 0644); dup2(outfile, STDOUT_FILENO); close(outfile); char *argv[] = { "date", NULL }; execvp(argv[0], argv); }
Problem 3: valgrind
and orphaned file descriptors
This problem may be useful in your assignments to show you how to ensure you aren’t leaking file descriptors.
Here’s a very short exercise to enhance your understanding of valgrind
and
what it can do for you. Start by opening nonsense.cc
in the lab2 directory,
and trace through the code to keep tabs on what file descriptors are created,
properly closed, and orphaned. Then run valgrind ./nonsense
to confirm that
there aren’t any memory leaks or errors (how could there be?), but then run
valgrind --track-fds=yes ./nonsense
to get information about the file
descriptors that were (intentionally) left open. Without changing the logic of
the program, insert as many close statements as necessary so that all file
descriptors (including 0, 1, and 2) are properly donated back. (In practice,
you do not have to close file descriptors 0, 1, and 2.)
- This should be pretty self-explanatory, so I won’t include anything here
beyond a remark that you need a
close
statement for every open descriptor identified by valgrind, and you should be careful to neverclose
a previously closed descriptor.
Problem 4: Experimenting with strace
This is an interesting problem, but it is the least important of the four. If you are short on time, skip this problem.
strace
is an advanced development tool that programmers use to determine what
system calls contribute to the execution of a program (generally because the
program is malfunctioning, and they’re curious if any failing system calls are
the root cause of the problem). If, for instance, you want to see how sleep 10
works behind the scenes, you gain a lot by typing this in and following along:
$ strace sleep 10
execve("/bin/sleep", ["sleep", "10"], [/* 28 vars */]) = 0
brk(NULL) = 0x1e35000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=143245, ...}) = 0
mmap(NULL, 143245, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f638fc15000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
// lots of calls omitted in the name of brevity
fstat(3, {st_mode=S_IFREG|0644, st_size=4289456, ...}) = 0
mmap(NULL, 4289456, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f638f231000
close(3) = 0
nanosleep({10, 0}, NULL) = 0
close(1) = 0
close(2) = 0
exit_group(0) = ?
+++ exited with 0 +++
A typical strace
run starts with an execve
(which my shell uses instead of
execvp
), and then works through all of these systemsy things to load C++
libraries, build the heap segment, etc., until it reaches the crucial
nanosleep
call, which is the call that halts the process for 10 seconds. You
see gestures to system calls that have come up in CS107 and CS110: execve
,
access
, mmap
, open
, and close
.
If you’re interested only in a particular subset of system calls, you can
identify those of interest when invoking strace
using the
-e trace=<comma-separated-list-of-syscall-names>
, as with this:
$ strace -e trace=read ls /usr/class/cs110
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260Z\0\0\0\0\0\0"..., 832) = 832
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\t\2\0\0\0\0\0"..., 832) = 832
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000\25\0\0\0\0\0\0"..., 832) = 832
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\r\0\0\0\0\0\0"..., 832) = 832
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260`\0\0\0\0\0\0"..., 832) = 832
read(3, "nodev\tsysfs\nnodev\trootfs\nnodev\tr"..., 1024) = 434
read(3, "", 1024) = 0
cgi-bin include lecture-examples lib local private_data repos samples staff tools WWW
+++ exited with 0 +++
$ strace -e trace=mmap,munmap ls /usr/class/cs110mmap(NULL, 143245, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7faf170d5000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faf170d4000
mmap(NULL, 2234080, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7faf16cb1000
mmap(0x7faf16ecf000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e000) = 0x7faf16ecf000
mmap(0x7faf16ed1000, 5856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7faf16ed1000
mmap(NULL, 3971488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7faf168e7000
mmap(0x7faf16ca7000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c0000) = 0x7faf16ca7000
// lots of calls omitted in the name of brevity
mmap(0x7faf1646f000, 13352, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7faf1646f000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faf170d2000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faf170d0000
munmap(0x7faf170d5000, 143245) = 0
mmap(NULL, 4289456, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7faf15e3e000
cgi-bin include lecture-examples lib local private_data repos samples staff tools WWW
+++ exited with 0 +++
Take some time to experiment with strace
. Type in each of these commands from
any directory and see what you get:
strace date
strace -e trace=write date
strace -e trace=clone,execve date
Note that calls to fork
are reframed as calls to clone
, and calls to
execvp
are reframed as calls to execve
.
Here’s what I get when I type the three commands above:
$ strace date
execve("/bin/date", ["date"], [/* 28 vars */]) = 0
brk(NULL) = 0x2572000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=143245, ...}) = 0
mmap(NULL, 143245, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd0bd20e000
// many calls omitted for brevity
close(3) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
write(1, "Sun Apr 15 15:41:08 PDT 2018\n", 29Sun Apr 15 15:41:08 PDT 2018
) = 29
close(1) = 0
close(2) = 0
exit_group(0) = ?
+++ exited with 0 +++
$ strace -e trace=write date
write(1, "Sun Apr 15 15:42:45 PDT 2018\n", 29Sun Apr 15 15:42:45 PDT 2018
) = 29
+++ exited with 0 +++
$ strace -e trace=clone,execve date
execve("/bin/date", ["date"], [/* 28 vars */]) = 0
Sun Apr 15 15:43:22 PDT 2018
+++ exited with 0 +++
Rather than list out specific system calls via -e
trace=<csv-list-of-syscall-name>
, strace
allows you to specify a family of
system calls, as per strace
’s man page, e.g. -e trace=file
, or -e trace=memory
.
Now type these commands from any directory and see what you get:
strace -e trace=file ls /usr/class/cs110
strace -e trace=memory ls /usr/class/cs110
strace -e trace=desc ls /usr/class/cs110
Rather than list the output, I’ll just be clear that the first command (with
trace=file
) should confine itself to execve
, access
, open
, statfs
,
stat
, and openat
. Every single one of these system calls has one or more
arguments that is the name of some file (e.g. "ls"
, "/usr/class/cs110"
,
"/proc/filesystems"
). Note that read
, write
, and close
don’t show up,
since those are descriptor-related, not file-related.
Including trace=memory
lists all operations that introduce and configured
new memory segments: brk
, mmap
, mprotect
, and munmap
. And trace=desc
lists anything that returns a descriptor and/or accepts one or more as
arguments: mmap
(again), open
, fstat
, close
, read
, openat
,
getdents
, and write
.
See what happens when you try to launch something that can’t be launched, because the specified file isn’t an executable:
strace /usr/class/cs110/WWW
strace /usr/class/cs110/WWW/index.html
Look at this, and see how execvp
fails! And note that the error messages
are printed to file descriptor 2 (e.g. STDERR_FILENO
).
$ strace /usr/class/cs110/WWW
execve("/usr/class/cs110/WWW", ["/usr/class/cs110/WWW"], [/* 28 vars */]) = -1 EACCES (Permission denied)
write(2, "strace: exec: Permission denied\n", 32strace: exec: Permission denied
) = 32
exit_group(1) = ?
+++ exited with 1 +++
$ strace /usr/class/cs110/WWW/index.html
execve("/usr/class/cs110/WWW/index.html", ["/usr/class/cs110/WWW/index.html"], [/* 28 vars */]) = -1 EACCES (Permission denied)
write(2, "strace: exec: Permission denied\n", 32strace: exec: Permission denied
) = 32
exit_group(1) = ?
+++ exited with 1 +++
Just for fun, try strace strace
, with:
strace strace ls
This lists all of the system calls called by strace
, but not those
associated with the grandchild ls
-running process.
Finally, descend into your lab2
folder and trace a few things there:
printf "4 6 8" | strace -e trace=clone,execve ./exargs factor
printf "4 6 8" | strace -f -e trace=clone,execve ./exargs factor
Scan strace
’s man page for information about the -f
flag, and be sure you
understand why the output of the second command includes so many calls to
execve
.
$ printf "4 6 8" | strace -e trace=clone,execve ./exargs factor
execve("./exargs", ["./exargs", "factor"], [/* 28 vars */]) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f57e7094a10) = 16504
4: 2 2
6: 2 3
8: 2 2 2
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16504, si_uid=11810, si_status=0, si_utime=0, si_stime=0} ---
+++ exited with 0 +++
$ printf "4 6 8" | strace -f -e trace=clone,execve ./exargs factor
execve("./exargs", ["./exargs", "factor"], [/* 28 vars */]) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6d3ed80a10) = 16682
strace: Process 16682 attached
[pid 16682] execve("/afs/ir/users/p/o/poohbear/bin/factor", ["factor", "4", "6", "8"], [/* 28 vars */]) = -1 ENOENT (No such file or directory)
[pid 16682] execve("/afs/ir/users/p/o/poohbear/local/bin/factor", ["factor", "4", "6", "8"], [/* 28 vars */]) = -1 ENOENT (No such file or directory)
[pid 16682] execve("/afs/ir/users/p/o/poohbear/.cabal/bin/factor", ["factor", "4", "6", "8"], [/* 28 vars */]) = -1 ENOENT (No such file or directory)
[pid 16682] execve("/usr/class/cs110/tools/factor", ["factor", "4", "6", "8"], [/* 28 vars */]) = -1 ENOENT (No such file or directory)
[pid 16682] execve("/usr/class/cs110/staff/bin/factor", ["factor", "4", "6", "8"], [/* 28 vars */]) = -1 ENOENT (No such file or directory)
[pid 16682] execve("/usr/local/sbin/factor", ["factor", "4", "6", "8"], [/* 28 vars */]) = -1 ENOENT (No such file or directory)
[pid 16682] execve("/usr/local/bin/factor", ["factor", "4", "6", "8"], [/* 28 vars */]) = -1 ENOENT (No such file or directory)
[pid 16682] execve("/usr/sbin/factor", ["factor", "4", "6", "8"], [/* 28 vars */]) = -1 ENOENT (No such file or directory)
[pid 16682] execve("/usr/bin/factor", ["factor", "4", "6", "8"], [/* 28 vars */]) = 0
4: 2 2
6: 2 3
8: 2 2 2
[pid 16682] +++ exited with 0 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16682, si_uid=11810, si_status=0, si_utime=0, si_stime=0} ---
+++ exited with 0 +++
strace
doesn’t read from standard input, so it doesn’t compete for the material produced by theprintf
. That’s good, because that material is intended forexargs
!- Only when you include
-f
do you see the additionalexecve
calls. strace is also clear when the system calls are executing in a process other than the primary; it prefaces the call with something like [pid 11354]. - The implementation of
execvp
(which ourexargs
relies on) loops overevery entry
in your path, appends factor to it, and see ifexecve
works out. That’s why you see all but the last call in the stream ofexecve
calls returning -1.