Lab 2 Solutions
These questions were written by Jerry Cain and Ryan Eberhardt.
Problem 1: Virtual Memory
Assume the OS allocates virtual memory to physical memory in 4096-byte pages.
- If the virtual address
0x7fffa2efc345
maps to the physical page in main memory whose base address is0x12345aab8000
, what range of virtual addresses around it would map to the same physical page?- Since we’re assuming 4096-byte pages, we’ve expect
0x7fffa2efc000
through0x7fffa2efcfff
to all map to the physical page with base address0x12345aab8000
(which houses bytes addressed0x12345aab8000
through0x12345aab8fff
).
- Since we’re assuming 4096-byte pages, we’ve expect
- What’s the largest size a character array can be before it absolutely must
map to three different physical pages?
- 2 * 4096 = 8192 bytes. This is the size of two pages; if the array is any larger, it can’t fit in two pages.
- What’s the smallest size a character array can be and still map to three
physical pages?
- 4096 + 1 + 1 = 4098 bytes. All bytes of the array must be contiguous, but in theory, it’s possible (even if unlikely) that the 0th byte of the array is the last byte of one physical page, bytes 1 through 4096 fill a second physical page, and byte 4097 must roll over to the 0th byte of a third physical page.
- In Assignment 1, the starter code used a syscall called
mmap
to create a new segment in memory that is mapped to a file on disk. Whenever memory is read from offsetx
within that segment, the corresponding data is loaded from disk into the memory segment. Based on what you know about virtual memory, speculate about how this might be implemented. (You’re free to add any necessary data to rows in the page mapping table.)- One possible solution: When installing the new segment, for each page in the segment, add corresponding rows to the page table. For each row, we can store a source file ID (e.g. maybe an inode number) and a bit indicating whether the page has been loaded into memory. When a program accesses that page, the CPU can check if the page has already been loaded into memory: if so, it will just read the value from memory, but if not, it will read the page from the file into physical memory.
- The standard page size for most systems is 4KB. Why might it be helpful to
use a larger page size? Why might it be helpful to use a smaller page size?
- Larger page size: Page table mappings introduce less overhead
- Smaller page size: Less possible fragmentation if only a small fraction of a page is actually being used
For fun, optional reading, read these two documents (though you needn’t do this reading, since it goes beyond the scope of my lecture discussion of virtual memory):
- http://www.cs.cmu.edu/afs/cs/academic/class/15213-f15/www/lectures/17-vm-concepts.pdf. These are lecture slides that Bryant, O’Hallaron, and their colleagues rely on while teaching the CMU equivalent of CS110 (they’re on a 15-week semester, so they go into more depth than we do).
- http://www.informit.com/articles/article.aspx?p=29961&seqNum=2: This is an article written some 15 years ago by two senior research scientists at HP Labs who were charged with the task of porting Linux to IA-64.
Problem 2: File descriptors
- Consider the following code:
int main() { int fd = open("/cplayground/code.cpp", O_RDONLY); printf("Using file descriptor %d\n", fd); // Try reading some bytes: char buf[16]; ssize_t num_read = read(fd, buf, sizeof(buf)); printf("Read %ld bytes\n", num_read); // Close file descriptor close(fd); }
-
What does
open
do to the file descriptor, open file, and vnode tables? What aboutread
? What aboutclose
?Open this Cplayground and press “Debug” to start the program. Navigate to the “Open Files” tab to see a visualization of the three tables. Try stepping through the code line-by-line to confirm your intuition.
open
: creates new entries in the vnode, open file, and file descriptor tablesread
: advances the cursor in the open file tableclose
: removes the file descriptor, which in turn removes the open file table entry, which in turn removes the vnode table entry
-
What happens to the file descriptor, open file, and vnode tables if you add an extra
open
call?int fd = open("/cplayground/code.cpp", O_RDONLY); int fd2 = open("/cplayground/code.cpp", O_RDONLY);
Use Cplayground to confirm your intuition.
- The second
open
call creates a new entry in the open file table. (As long asopen
succeeds, this will always be true. Conceptually,open
is creating a new session, and the open file table stores sessions.) This also causes the refcount to be incremented in the vnode table, and of course, a new file descriptor is created to point to this new session.
- The second
-
- The
dup
system call accepts a valid file descriptor, claims a new, previously unused file descriptor, configures that new descriptor to alias the same file session as the incoming one, and then returns it. Briefly outline what happens to the relevant open file table and vnode table entries as a result ofdup
being called. (Readman dup
if you’d like, though don’t worry about error scenarios).- The vnode table entry is left alone, but a new file descriptor is claimed and set to address the same entry in the open file table session as the incoming one, and the reference count within that session entry would be incremented by one.
Problem 3: Creating processes
- Consider the following code (try it on Cplayground):
int main() { int fd1 = open("/cplayground/code.cpp", O_RDONLY); printf("File descriptor 1: %d\n", fd1); fork(); int fd2 = open("/cplayground/code.cpp", O_RDONLY); printf("File descriptor 2: %d\n", fd2); close(fd1); close(fd2); }
What do the file descriptor, open file, and vnode table entries look like after each syscall? Use the Cplayground debugger to confirm your intuition.
open
creates new entries in the file descriptor, open file, and vnode tablesfork()
copies all of the file descriptors. This causes the reference count in the open file table to increase, as both file descriptors are pointing to the same session.- The second
open()
creates new sessions in each process, separately. There are now threecode.cpp
sessions open. - The
close
calls close the file descriptors, decrementing the appropriate reference counts until allcode.cpp
entries are removed from all three tables.
- Consider the following code (try it on
Cplayground:
int main() { // Desired output: "Hello from child process" and // "Hello from parent process" (in no particular order) pid_t pid = fork(); // fork() returns twice: // * The child process will see a return value of 0 // * The parent process will see a return value of the child's PID if (pid == 0) { printf("Hello from child process!\n"); } printf("Hello from parent process!\n"); return 0; }
This prints “Hello from the child process!” in the child process, and prints “Hello from the parent process!” in the parent process. However, if you run it, there’s an extra print statement! Why does this happen? How can we fix the code to get rid of it?
- The child process executes the code inside of the
if
statement, but then it continues on to execute all subsequent code (i.e. the code intended for the parent process). To fix it, we can addexit(0);
at the end of the child’sif
statement.
- The child process executes the code inside of the
- Your terminal can be configured so that a process dumps core – that is,
generates a data file named
core
– whenever it crashes (because it seg faults, for instance.) Thiscore
file can be loaded into and analyzed withingdb
to help identify where and why the program is crashing. Assuming we can modify the program source code and recompile, how might you programmatically generate a core dump at specific point in the program while allowing the process to continue executing? (Your answer might include a very, very short code snippet to make its point.)- Call
fork
at the line of interest, and have the child intentionally segfault (e.g.*(int *) NULL = 14
; orraise(SIGSEGV);
on the line immediately after thefork
.
- Call