Lab 1 Solutions

These questions were written by Jerry Cain and Ryan Eberhardt.

Getting started

Before starting, go ahead and clone the lab1 folder, which is necessary for problems 3 and 4:

$ git clone /usr/class/cs110/repos/lab1/shared lab1
$ cd lab1
$ make

Problem 1: Direct, Singly Indirect, and Doubly Indirect Block Numbers

Assume blocks are 512 bytes in size, block numbers are four-byte ints, and that inodes include space for 6 block numbers. The first three contain direct block numbers, the next two contain singly indirect block numbers, and the final one contains a doubly indirect block number. (Note: This is different from the scheme we’ve discussed in class.)

What’s the maximum file size?
- Maximum file size is (3 direct block numbers) * 512 + (2 singly indirect numbers) * (512/4 numbers per block) * 512 + (512/4 singly indirect numbers per block) * (512/4 direct numbers per block) * 512, or 8,521,216 bytes.
How large does a file need to be before the relevant inode requires the first singly indirect block number be used?
- 3 * 512, per above.
How large does a file need to be before the relevant inode requires the first doubly indirect block number be used?
- 3 * 512 + 2 * (512 / 4) * 512, per above.
Draw as detailed an inode as you can if it’s to represent a regular file that’s 2049 bytes in size.
- The file occupies ceil(2049 / 512) = 5 blocks. The first three will be addressed in the first three slots of the inode; the other two can be addressed by a single singly-indirect block. In summary, the first three spaces in the inode point directly to file blocks, and the fourth space points to a singly direct block, which in turn points to two file blocks.

Problem 2: Filesystems short-answer questions

Consider the Unix V6 filesystem, which uses 512-byte blocks and 32-byte inodes. Which sector should we read in order to get inode 256? If that sector is an array of inodes, which index should we go to in order to get inode 256? What about inode 345?
- Keep in mind that inode numbers start from 1, not 0!
- 512B / 32B = 16 inodes/block
- Inode 256 is at block 2 + (256 - 1) / 16 = 17. It is at index (256 - 1) % 16 = 15.
- Inode 345 is at block 2 + (345 - 1) / 16 = 23. It is at index (345 - 1) % 16 = 8.
Unlike the Unix V6 filesystem we learned about in class, the ext2/3/4 family of filesystems (commonly used on Linux) use variable-length directory entries:
```
struct ext3_dir_entry {
  uint32_t inode_number;
  uint16_t entry_length;
  uint16_t name_length;
  char[] file_name;
};
```
What is the benefit to designing directory entries this way? What are some drawbacks? (Consider what happens when deleting files.)
- Benefit: you can store long filenames without wasting lots of space if some names are long but most are short
- Drawback: Iterating over directory entries is more complicated. It’s no longer a simple array of directory entries, and we can no longer have random access to dirents; instead, we need to read each entry one by one, since each entry tells you where the next one is (via the entry_length)
- Drawback: Deleting a directory entry leaves a “hole” of a particular size based on the name length, and if you want to create a new directory entry with a longer name, you won’t be able to put it in that “hole.” There is more complexity in choosing where to put directory entries, since you need to find a suitable spot for the directory entries. This is a classic example of fragmentation problems.
In lecture, we described how soft/symbolic links are just another kind of file, with the link destination stored as file contents. However, this can make accessing files via symbolic links quite slow, since we need to find the symlink inode, read the file contents, find the destination file inode, and read its contents. What optimization might make symbolic links faster when the target file path is short?
- We can place the target file path inside of the portion of the inode that is traditionally used to store block numbers. As it turns out, the ext2/3/4 family of filesystems make this optimization.

Problem 3: Debugging practice

In this problem, we’ll look at a program called search, which works similar to the built in find command that you may or may not have used in the past.

search takes two arguments: a directory to search in, and a file to search for. For example, we can search for hi in the current directory, and search will recurse through all child directories, printing any matching files:

🍉  ./search . hi.txt
./samples/a/b/c/hi.txt
./samples/hi.txt
./samples/subdir/hi.txt
./samples/subdir2/hi.txt

We have implemented search for you in the lab1 directory. However, there are three bugs that prevent the implementation from working.

First, take a few minutes to read through the code and make sure you understand what it’s trying to do. The program works by recursing through subdirectories using the listMatches function. Here are some example recursive calls:

listMatches(".", 1, "hi.txt")
- listMatches("./samples", 9, "hi.txt")
- listMatches("./samples/a", 11, "hi.txt")
  - listMatches("./samples/a/b", 13, "hi.txt")
    - listMatches("./samples/a/b/c", 15, "hi.txt")
- listMatches("./samples/subdir", 16, "hi.txt")
- listMatches("./samples/subdir2", 17, "hi.txt")

There are a few new functions used in this code, such as stat (which gets information from a file’s inode), opendir (which opens a directory for reading), and readdir (which reads a directory entry from an open directory). You won’t need these functions for the rest of this class, so don’t spend much time worrying about them; you just need to understand what they are being used for.

Once you have a feel for how the code is supposed to work, try using print statements and/or gdb to follow its execution and get an idea of what it is actually doing. Again, there are three bugs you’ll need to fix in order to make it work!

To be clear, we don’t want you to read the code and find what’s wrong just by looking at it. Sometimes you can find bugs that way, but many times it’s more productive to debug by building a mental model of how the program is misbehaving, and then use that information to figure out where the bug is.

Here are the three bugs:

On line 76, we declare a pointer struct stat *st and subsequently pass this to stat, but the pointer is never actually initialized to point anywhere. stat is reading the inode and putting the information in the memory addressed by that pointer, but if it points to garbage memory, this can cause a segfault.

This bug is probably easiest to find using gdb with backtrace.

The correct code should look like this:
```
struct stat st;
stat(directory, &st);
if (!S_ISDIR(st.st_mode)) {
```
On line 39, we append / to path, but we don’t subsequently increment lengthOfBasePath. This means that on line 49, when we append the child file name, that overwrites the /.

You could find this bug by stepping through line-by-line in gdb and printing the path, or by using print statements. If you print path before the while loop, you’ll see that it includes the /, but if you print after, you’ll see the / has been erased.
This function recurses on any child directories, including . and ... This leads to infinite recursion, since . and .. form cycles in the filesystem. The fix is to add this line after line 44:
```
if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0) continue;
```

Problem 4: `valgrind` and orphaned file descriptors

This problem may be useful in your assignments to show you how to ensure you aren’t leaking file descriptors.

Here’s a very short exercise to enhance your understanding of valgrind and what it can do for you. Start by opening open-fds.cc in the lab1 directory, and trace through the code to keep tabs on what file descriptors are created, properly closed, and orphaned. Then run valgrind ./open-fds to confirm that there aren’t any memory leaks or errors (how could there be?), but then run valgrind --track-fds=yes ./open-fds to get information about the file descriptors that were (intentionally) left open. Without changing the logic of the program, insert as many close statements as necessary so that all file descriptors (including 0, 1, and 2) are properly donated back. (In practice, you do not have to close file descriptors 0, 1, and 2.)

This should be pretty self-explanatory, so I won’t include anything here beyond a remark that you need a close statement for every open descriptor identified by valgrind, and you should be careful to never close a previously closed descriptor.