Lab 1: Filesystems and System Calls

These questions were written by Jerry Cain and Ryan Eberhardt.

Before the end of lab, be sure to fill out the lab checkoff sheet here!

Getting started

Before starting, go ahead and clone the lab1 folder, which is necessary for problems 3 and 4:

$ git clone /usr/class/cs110/repos/lab1/shared lab1
$ cd lab1
$ make

Problem 1: Direct, Singly Indirect, and Doubly Indirect Block Numbers

Assume blocks are 512 bytes in size, block numbers are four-byte ints, and that inodes include space for 6 block numbers. The first three contain direct block numbers, the next two contain singly indirect block numbers, and the final one contains a doubly indirect block number. (Note: This is different from the scheme we’ve discussed in class.)

What’s the maximum file size?
How large does a file need to be before the relevant inode requires the first singly indirect block number be used?
How large does a file need to be before the relevant inode requires the first doubly indirect block number be used?
Draw as detailed an inode as you can if it’s to represent a regular file that’s 2049 bytes in size.

Problem 2: Filesystems short-answer questions

Consider the Unix V6 filesystem, which uses 512-byte blocks and 32-byte inodes. Which sector should we read in order to get inode 256? If that sector is an array of inodes, which index should we go to in order to get inode 256? What about inode 345?
Unlike the Unix V6 filesystem we learned about in class, the ext2/3/4 family of filesystems (commonly used on Linux) use variable-length directory entries:
```
struct ext3_dir_entry {
  uint32_t inode_number;
  uint16_t name_length;
  uint16_t file_type;
  char[] file_name;
};
```
What is the benefit to designing directory entries this way? What are some drawbacks? (Consider what happens when deleting files.)
In lecture, we described how soft/symbolic links are just another kind of file, with the link destination stored as file contents. However, this can make accessing files via symbolic links quite slow, since we need to find the symlink inode, read the file contents, find the destination file inode, and read its contents. What optimization might make symbolic links faster when the target file path is short?

Problem 3: Debugging practice

In this problem, we’ll look at a program called search, which works similar to the built in find command that you may or may not have used in the past.

search takes two arguments: a directory to search in, and a file to search for. For example, we can search for hi in the current directory, and search will recurse through all child directories, printing any matching files:

🍉  ./search . hi.txt
./samples/a/b/c/hi.txt
./samples/hi.txt
./samples/subdir/hi.txt
./samples/subdir2/hi.txt

We have implemented search for you in the lab1 directory. However, there are three bugs that prevent the implementation from working.

First, take a few minutes to read through the code and make sure you understand what it’s trying to do. The program works by recursing through subdirectories using the listMatches function. Here are some example recursive calls:

listMatches(".", 1, "hi.txt")
- listMatches("./samples", 9, "hi.txt")
- listMatches("./samples/a", 11, "hi.txt")
  - listMatches("./samples/a/b", 13, "hi.txt")
    - listMatches("./samples/a/b/c", 15, "hi.txt")
- listMatches("./samples/subdir", 16, "hi.txt")
- listMatches("./samples/subdir2", 17, "hi.txt")

There are a few new functions used in this code, such as stat (which gets information from a file’s inode), opendir (which opens a directory for reading), and readdir (which reads a directory entry from an open directory). You won’t need these functions for the rest of this class, so don’t spend much time worrying about them; you just need to understand what they are being used for.

Once you have a feel for how the code is supposed to work, try using print statements and/or gdb to follow its execution and get an idea of what it is actually doing. Again, there are three bugs you’ll need to fix in order to make it work!

To be clear, we don’t want you to read the code and find what’s wrong just by looking at it. Sometimes you can find bugs that way, but many times it’s more productive to debug by building a mental model of how the program is misbehaving, and then use that information to figure out where the bug is.

Problem 4: `valgrind` and orphaned file descriptors

This problem may be useful in your assignments to show you how to ensure you aren’t leaking file descriptors.

Here’s a very short exercise to enhance your understanding of valgrind and what it can do for you. Start by opening open-fds.cc in the lab1 directory, and trace through the code to keep tabs on what file descriptors are created, properly closed, and orphaned. Then run valgrind ./open-fds to confirm that there aren’t any memory leaks or errors (how could there be?), but then run valgrind --track-fds=yes ./open-fds to get information about the file descriptors that were (intentionally) left open. Without changing the logic of the program, insert as many close statements as necessary so that all file descriptors (including 0, 1, and 2) are properly donated back. (In practice, you do not have to close file descriptors 0, 1, and 2.)