Lab 1 Solutions
These questions were written by Jerry Cain and Ryan Eberhardt.
Getting started
Before starting, go ahead and clone the lab1
folder, which is necessary for
problems 3 and 4:
$ git clone /usr/class/cs110/repos/lab1/shared lab1
$ cd lab1
$ make
Problem 1: Direct, Singly Indirect, and Doubly Indirect Block Numbers
Assume blocks are 512 bytes in size, block numbers are four-byte int
s,
and that inodes include space for 6 block numbers. The first three contain
direct block numbers, the next two contain singly indirect block numbers, and
the final one contains a doubly indirect block number. (Note: This is
different from the scheme we’ve discussed in class.)
- What’s the maximum file size?
- Maximum file size is (3 direct block numbers) * 512 + (2 singly indirect numbers) * (512/4 numbers per block) * 512 + (512/4 singly indirect numbers per block) * (512/4 direct numbers per block) * 512, or 8,521,216 bytes.
- How large does a file need to be before the relevant inode requires the first
singly indirect block number be used?
- 3 * 512, per above.
- How large does a file need to be before the relevant inode requires the first
doubly indirect block number be used?
- 3 * 512 + 2 * (512 / 4) * 512, per above.
- Draw as detailed an inode as you can if it’s to represent a regular file
that’s 2049 bytes in size.
- The file occupies ceil(2049 / 512) = 5 blocks. The first three will be addressed in the first three slots of the inode; the other two can be addressed by a single singly-indirect block. In summary, the first three spaces in the inode point directly to file blocks, and the fourth space points to a singly direct block, which in turn points to two file blocks.
Problem 2: Filesystems short-answer questions
- Consider the Unix V6 filesystem, which uses 512-byte blocks and 32-byte
inodes. Which sector should we read in order to get inode 256? If that
sector is an array of inodes, which index should we go to in order to get
inode 256? What about inode 345?
- Keep in mind that inode numbers start from 1, not 0!
- 512B / 32B = 16 inodes/block
- Inode 256 is at block 2 + (256 - 1) / 16 = 17. It is at index (256 - 1) % 16 = 15.
- Inode 345 is at block 2 + (345 - 1) / 16 = 23. It is at index (345 - 1) % 16 = 8.
- Unlike the Unix V6 filesystem we learned about in class, the ext2/3/4 family
of filesystems (commonly used on Linux) use variable-length directory
entries:
struct ext3_dir_entry { uint32_t inode_number; uint16_t entry_length; uint16_t name_length; char[] file_name; };
What is the benefit to designing directory entries this way? What are some drawbacks? (Consider what happens when deleting files.)
- Benefit: you can store long filenames without wasting lots of space if some names are long but most are short
- Drawback: Iterating over directory entries is more complicated. It’s no
longer a simple array of directory entries, and we can no longer have
random access to dirents; instead, we need to read each entry one by one,
since each entry tells you where the next one is (via the
entry_length
) - Drawback: Deleting a directory entry leaves a “hole” of a particular size based on the name length, and if you want to create a new directory entry with a longer name, you won’t be able to put it in that “hole.” There is more complexity in choosing where to put directory entries, since you need to find a suitable spot for the directory entries. This is a classic example of fragmentation problems.
- In lecture, we described how soft/symbolic links are just another kind of
file, with the link destination stored as file contents. However, this can
make accessing files via symbolic links quite slow, since we need to find
the symlink inode, read the file contents, find the destination file inode,
and read its contents. What optimization might make symbolic links faster
when the target file path is short?
- We can place the target file path inside of the portion of the inode that is traditionally used to store block numbers. As it turns out, the ext2/3/4 family of filesystems make this optimization.
Problem 3: Debugging practice
In this problem, we’ll look at a program called search
, which works similar
to the built in find
command that you may or may not have used in the past.
search
takes two arguments: a directory to search in, and a file to search
for. For example, we can search for hi
in the current directory, and search
will recurse through all child directories, printing any matching files:
🍉 ./search . hi.txt
./samples/a/b/c/hi.txt
./samples/hi.txt
./samples/subdir/hi.txt
./samples/subdir2/hi.txt
We have implemented search
for you in the lab1
directory. However, there
are three bugs that prevent the implementation from working.
First, take a few minutes to read through the code and make sure you understand
what it’s trying to do. The program works by recursing through subdirectories
using the listMatches
function. Here are some example recursive calls:
listMatches(".", 1, "hi.txt")
listMatches("./samples", 9, "hi.txt")
listMatches("./samples/a", 11, "hi.txt")
listMatches("./samples/a/b", 13, "hi.txt")
listMatches("./samples/a/b/c", 15, "hi.txt")
listMatches("./samples/subdir", 16, "hi.txt")
listMatches("./samples/subdir2", 17, "hi.txt")
There are a few new functions used in this code, such as stat
(which gets
information from a file’s inode), opendir
(which opens a directory for
reading), and readdir
(which reads a directory entry from an open directory).
You won’t need these functions for the rest of this class, so don’t spend much
time worrying about them; you just need to understand what they are being used
for.
Once you have a feel for how the code is supposed to work, try using print statements and/or gdb to follow its execution and get an idea of what it is actually doing. Again, there are three bugs you’ll need to fix in order to make it work!
To be clear, we don’t want you to read the code and find what’s wrong just by looking at it. Sometimes you can find bugs that way, but many times it’s more productive to debug by building a mental model of how the program is misbehaving, and then use that information to figure out where the bug is.
Here are the three bugs:
-
On line 76, we declare a pointer
struct stat *st
and subsequently pass this tostat
, but the pointer is never actually initialized to point anywhere.stat
is reading the inode and putting the information in the memory addressed by that pointer, but if it points to garbage memory, this can cause a segfault.This bug is probably easiest to find using gdb with
backtrace
.The correct code should look like this:
struct stat st; stat(directory, &st); if (!S_ISDIR(st.st_mode)) {
-
On line 39, we append
/
topath
, but we don’t subsequently incrementlengthOfBasePath
. This means that on line 49, when we append the child file name, that overwrites the/
.You could find this bug by stepping through line-by-line in
gdb
and printing thepath
, or by using print statements. If you printpath
before thewhile loop
, you’ll see that it includes the/
, but if you print after, you’ll see the/
has been erased. -
This function recurses on any child directories, including
.
and..
. This leads to infinite recursion, since.
and..
form cycles in the filesystem. The fix is to add this line after line 44:if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0) continue;
Problem 4: valgrind
and orphaned file descriptors
This problem may be useful in your assignments to show you how to ensure you aren’t leaking file descriptors.
Here’s a very short exercise to enhance your understanding of valgrind
and
what it can do for you. Start by opening open-fds.cc
in the lab1 directory,
and trace through the code to keep tabs on what file descriptors are created,
properly closed, and orphaned. Then run valgrind ./open-fds
to confirm that
there aren’t any memory leaks or errors (how could there be?), but then run
valgrind --track-fds=yes ./open-fds
to get information about the file
descriptors that were (intentionally) left open. Without changing the logic of
the program, insert as many close statements as necessary so that all file
descriptors (including 0, 1, and 2) are properly donated back. (In practice,
you do not have to close file descriptors 0, 1, and 2.)
- This should be pretty self-explanatory, so I won’t include anything here
beyond a remark that you need a
close
statement for every open descriptor identified by valgrind, and you should be careful to neverclose
a previously closed descriptor.