Lab 2 Solutions

These questions were written by Jerry Cain and Ryan Eberhardt.

Problem 1: Virtual Memory

Assume the OS allocates virtual memory to physical memory in 4096-byte pages.

If the virtual address 0x7fffa2efc345 maps to the physical page in main memory whose base address is 0x12345aab8000, what range of virtual addresses around it would map to the same physical page?
- Since we’re assuming 4096-byte pages, we’ve expect 0x7fffa2efc000 through 0x7fffa2efcfff to all map to the physical page with base address 0x12345aab8000 (which houses bytes addressed 0x12345aab8000 through 0x12345aab8fff).
What’s the largest size a character array can be before it absolutely must map to three different physical pages?
- 2 * 4096 = 8192 bytes. This is the size of two pages; if the array is any larger, it can’t fit in two pages.
What’s the smallest size a character array can be and still map to three physical pages?
- 4096 + 1 + 1 = 4098 bytes. All bytes of the array must be contiguous, but in theory, it’s possible (even if unlikely) that the 0th byte of the array is the last byte of one physical page, bytes 1 through 4096 fill a second physical page, and byte 4097 must roll over to the 0th byte of a third physical page.
In Assignment 1, the starter code used a syscall called mmap to create a new segment in memory that is mapped to a file on disk. Whenever memory is read from offset x within that segment, the corresponding data is loaded from disk into the memory segment. Based on what you know about virtual memory, speculate about how this might be implemented. (You’re free to add any necessary data to rows in the page mapping table.)
- One possible solution: When installing the new segment, for each page in the segment, add corresponding rows to the page table. For each row, we can store a source file ID (e.g. maybe an inode number) and a bit indicating whether the page has been loaded into memory. When a program accesses that page, the CPU can check if the page has already been loaded into memory: if so, it will just read the value from memory, but if not, it will read the page from the file into physical memory.
The standard page size for most systems is 4KB. Why might it be helpful to use a larger page size? Why might it be helpful to use a smaller page size?
- Larger page size: Page table mappings introduce less overhead
- Smaller page size: Less possible fragmentation if only a small fraction of a page is actually being used

For fun, optional reading, read these two documents (though you needn’t do this reading, since it goes beyond the scope of my lecture discussion of virtual memory):

http://www.cs.cmu.edu/afs/cs/academic/class/15213-f15/www/lectures/17-vm-concepts.pdf. These are lecture slides that Bryant, O’Hallaron, and their colleagues rely on while teaching the CMU equivalent of CS110 (they’re on a 15-week semester, so they go into more depth than we do).
http://www.informit.com/articles/article.aspx?p=29961&seqNum=2: This is an article written some 15 years ago by two senior research scientists at HP Labs who were charged with the task of porting Linux to IA-64.

Problem 2: File descriptors

Consider the following code:
```
int main() {
    int fd = open("/cplayground/code.cpp", O_RDONLY);
    printf("Using file descriptor %d\n", fd);
   
    // Try reading some bytes:
    char buf[16];
    ssize_t num_read = read(fd, buf, sizeof(buf));
    printf("Read %ld bytes\n", num_read);
   
    // Close file descriptor
    close(fd);
}
```
1. What does open do to the file descriptor, open file, and vnode tables? What about read? What about close?
  
  Open this Cplayground and press “Debug” to start the program. Navigate to the “Open Files” tab to see a visualization of the three tables. Try stepping through the code line-by-line to confirm your intuition.
  - open: creates new entries in the vnode, open file, and file descriptor tables
  - read: advances the cursor in the open file table
  - close: removes the file descriptor, which in turn removes the open file table entry, which in turn removes the vnode table entry
2. What happens to the file descriptor, open file, and vnode tables if you add an extra open call?
```
int fd = open("/cplayground/code.cpp", O_RDONLY);
int fd2 = open("/cplayground/code.cpp", O_RDONLY);
```
  Use Cplayground to confirm your intuition.
  - The second open call creates a new entry in the open file table. (As long as open succeeds, this will always be true. Conceptually, open is creating a new session, and the open file table stores sessions.) This also causes the refcount to be incremented in the vnode table, and of course, a new file descriptor is created to point to this new session.
The dup system call accepts a valid file descriptor, claims a new, previously unused file descriptor, configures that new descriptor to alias the same file session as the incoming one, and then returns it. Briefly outline what happens to the relevant open file table and vnode table entries as a result of dup being called. (Read man dup if you’d like, though don’t worry about error scenarios).
- The vnode table entry is left alone, but a new file descriptor is claimed and set to address the same entry in the open file table session as the incoming one, and the reference count within that session entry would be incremented by one.

Problem 3: Creating processes

Consider the following code (try it on Cplayground):
```
int main() {
    int fd1 = open("/cplayground/code.cpp", O_RDONLY);
    printf("File descriptor 1: %d\n", fd1);
      
    fork();
      
    int fd2 = open("/cplayground/code.cpp", O_RDONLY);
    printf("File descriptor 2: %d\n", fd2);
      
    close(fd1);
    close(fd2);
}
```
What do the file descriptor, open file, and vnode table entries look like after each syscall? Use the Cplayground debugger to confirm your intuition.
- open creates new entries in the file descriptor, open file, and vnode tables
- fork() copies all of the file descriptors. This causes the reference count in the open file table to increase, as both file descriptors are pointing to the same session.
- The second open() creates new sessions in each process, separately. There are now three code.cpp sessions open.
- The close calls close the file descriptors, decrementing the appropriate reference counts until all code.cpp entries are removed from all three tables.
Consider the following code (try it on Cplayground:
```
int main() {
    // Desired output: "Hello from child process" and
    // "Hello from parent process" (in no particular order)
    pid_t pid = fork();
    // fork() returns twice:
    // * The child process will see a return value of 0
    // * The parent process will see a return value of the child's PID
    if (pid == 0) {
        printf("Hello from child process!\n");
    }
    printf("Hello from parent process!\n");
    return 0;
}
```
This prints “Hello from the child process!” in the child process, and prints “Hello from the parent process!” in the parent process. However, if you run it, there’s an extra print statement! Why does this happen? How can we fix the code to get rid of it?
- The child process executes the code inside of the if statement, but then it continues on to execute all subsequent code (i.e. the code intended for the parent process). To fix it, we can add exit(0); at the end of the child’s if statement.
Your terminal can be configured so that a process dumps core – that is, generates a data file named core – whenever it crashes (because it seg faults, for instance.) This core file can be loaded into and analyzed within gdb to help identify where and why the program is crashing. Assuming we can modify the program source code and recompile, how might you programmatically generate a core dump at specific point in the program while allowing the process to continue executing? (Your answer might include a very, very short code snippet to make its point.)
- Call fork at the line of interest, and have the child intentionally segfault (e.g. *(int *) NULL = 14; or raise(SIGSEGV); on the line immediately after the fork.

CS 110

Lab 2 Solutions

Problem 1: Virtual Memory

Problem 2: File descriptors

Problem 3: Creating processes