Lecture 1: Introduction to Filesystems

Note: Reading these lecture notes is not a substitute for watching the lecture. I frequently go off script, and you are responsible for understanding everything I talk about in lecture unless I specify otherwise.

A software view of hard drives

Just like RAM, hard drives provide us with some contiguous space that we can store information in. Information in RAM is byte-addressable: even if you’re only trying to store a boolean (1 bit), you need to read an entire byte (8 bits) to retrieve that boolean from memory, and if you want to flip the boolean, you need to write the entire byte back to memory. A similar concept exists in the world of hard drives. Hard drives are divided into sectors (often 512 bytes large), and are sector-addressable: you must read or write entire 512-byte sectors, even if you’re only interested in 32 bytes of information within a sector.

Sidenote: sectors are most commonly 512 bytes large, but this can vary. The size is determined by the physical hard drive (so it’s possible to have a computer with several hard drives, each using a different sector size).

Conceptually, a hard drive divided into sectors might look like this:

This is what the hardware presents us with, and this small amount is all you really need to know in order to start designing basic filesystems. As filesystem designers, we need to figure out a way to take this primitive system and use it to store a user’s files.

Note: throughout the rest of the course, you may hear me use the term “block” instead of “sector.” Sectors are the physical base units on the hard drive, but the operating system might elect to do all its filesystem operations in terms of blocks (which are each comprised of several sectors). If an operating system chooses a block size of 1024 (spanning two 512-byte sectors), then when it accesses or modifies the filesystem, it will only read/write from the disk in 1024-byte chunks (despite the fact that the disk is capable of operating in 512-byte chunks). This has performance benefits which we won’t talk about (take CS 140 if you’re interested!). In this class, “blocks” and “sectors” are interchangeable.

Unix V6 filesystem concepts

Note: if you can, you should really watch this part of lecture, as I talk through the motivating factors that drove this design.

This diagram shows the basic design of the Unix V6 filesystem. There is a lot to unpack, so I will explain below:

General layout

At the beginning of the disk, a series of blocks is reserved for filesystem metadata. Metadata includes information such as the number of free blocks available, information for figuring out which files are stored where, etc. (Incidentally, this is the reason that when you buy a new 16GB flash drive and plug it into your computer, your computer reports that it has less than 16GB of free space.) The remaining blocks are used to store actual file contents. (The diagram is a little misleading; the portion used to store file contents is far larger than the portion used to store metadata.)

Files are split into block-sized chunks (i.e. usually 512-byte chunks) when they are stored. If a file doesn’t fit evenly in 512-byte chunks, then oh well, we waste some space. (This is the case with the blue file in the diagram. Its filesize is only 32 bytes, yet an entire 512-byte block is dedicated to storing it.)

Inodes

Somehow, we need to keep track of which blocks are being used to store the pieces of a file. (Blocks 1025, 1027, and 1028 contain the file contents for the same file, but if I hadn’t color-coded the diagram, you wouldn’t have known that.) This is the purpose of inodes. Inodes (“information nodes” – fancy term for a thing storing bookkeeping info) are 32-byte (or sometimes 64-byte) structures that store information about files on disk. As you will see in assignment 1, lots of different info is included in an inode, such as the owner of the file, permissions information (is anyone on the machine allowed to edit this?), the time the file was last modified, and more. Additionally, most relevant to this discussion, it stores the file size, file type (regular file or directory – more on that in a moment), and an array of block numbers identifying the blocks storing this file’s contents.

In the diagram, take a look at the contents of Inode 2, outlined with a dashed green line. The file size is 1028, so automatically, we know that this file must span 3 blocks (if the block size is 512). The block numbers are listed as 1027, 1028, and 1025, so if we are trying to read bytes 0-511 from the file, we look in block 1027 for that info; if we are trying to read bytes 512-1023 from the file, we look in block 1028; and if we are trying to read bytes 1024-1027 from the file, we look in block 1025.

Note: The blocks used to store file contents are not necessarily contiguous or in order! In the case of inode 2, block 1025 is used to store the last part of the file, even though this is physically located after block 1027 (which stores the first part of the file).

Note: Block 1025 is only being used to store four bytes of the green file. 508 bytes in the block are being wasted. This is just how the filesystem is designed; if we wanted to reclaim that space, we would have to do some really fancy things that drastically increase the complexity of the filesystem. (Some modern filesystems do implement schemes like that, but most don’t because it isn’t worth the added complexity.) As a thought experiment, think about how your bookkeeping information would have to change if you wanted to be able to use fragments of blocks for storing multiple files.

The inodes tell us where to find a file’s contents, but we also need to store the inodes themselves on disk. A series of blocks is reserved for this, starting from block #2. Because inodes have a fixed size of 32 bytes, we can fit several of them in each block, which is what we do. For illustration purposes, I only drew 4 inodes per bloc in the diagram above. In reality, if the inodes are 32 bytes large (as they are in the Unix V6 filesystem) and the sector size is 512 bytes, there will be 512/32 = 16 inodes per sector.

Note: Inodes are indexed starting from 1. This has some historical reasons (which you don’t need to know). For the curious, inode-related functions in the operating system return 0 when there is an error (because they are returning unsigned values, so they can’t return -1). Therefore, we reserve inumber 0.

What about filenames/paths?

If you needed to remember the inumber (inode index) for every file on your computer, that would be tragic. We could add a filename field to the inode, and then search through the inodes to find a matching file whenever we want something. However, this turns out to work poorly. In order to minimize the size of the metadata portion of the disk, we need to keep the inodes small, which means the maximum filename would need to be short. In order to keep things organized, humans tend to generate fairly long file paths; the file I’m currently editing has a file path of /Users/reberhardt/Documents/Stanford/Summer CS 110/www/content/lecture-notes/lecture-1.md. That path is 90 characters (90 bytes) – triple the current size of an inode. Additionally, performance quickly degrades if you need to search every inode for the file you’re looking for. My computer currently has 83,705,333 inodes in use, and it would be bad if I had to search every one of them every time I wanted to access a file.

The solution is to implement directories. You may be surprised to find that this requires almost no changes to our existing scheme; we can layer directories on top of the file abstraction we already have. In almost all filesystems (unix v6 included), directories are just files, the same as any other file (with the exception that they are marked as directories by the file type field in the inode). The contents of these files is a series of 16-byte slivers that form a table mapping filenames to inumbers.

Have a look at the contents of block 1024 (i.e. the contents of file with inumber 1) in the diagram above. This directory contains two files, so its total file size is 32; the first 16 bytes form the first row of the table (14 bytes for the filename, 2 for the inumber), and the second 16 bytes form the second row of the table. When we are looking for a file in the directory, we search this table for the corresponding inumber.

What does the file lookup process look like, then? Consider a file at /usr/class/cs110/example.txt. First, we find the inode for the file / (which always has inumber 1). In the table from that file’s contents, we look up the token usr. Let’s say it’s at inode 5. Then, we get file 5’s contents and look up the token class. From there, we look up the token cs110, and, finally, the token example.txt.

Large files

In the Unix V6 filesystem, inodes can only store a maximum of 8 block numbers. This limits the total file size to 8 * 512 = 4096 bytes. That’s not very big.

To resolve this problem, we use a scheme called indirect addressing. Normally, the inode stores the block numbers that form a file; as an example, let’s say the file is stored across blocks 2001-2008. The inode will store the numbers 2001-2008. We want to append to the file, but the inode can’t store any more block numbers. Instead, let’s allocate a block on disk (where the file contents are) – let’s say this is block 2050 – and let’s store the numbers 2001-2009 in that block. Then, we update the inode to store only block number 2050, and we set a flag specifying that we’re using this large-addressing scheme. When we want to get the contents of the file, we check the inode and see the “big file” flag. We get the first block number, read that block, and then read the actual block numbers (storing file contents) from that block.

This is known as singly-indirect addressing. We can store up to 8 singly-indirect block numbers in an inode, and each of those blocks can store 512 / 2 = 256 block numbers (block numbers are shorts, which are 2 bytes). Therefore, this increases the maximum file size to (8 singly indirect block numbers) * (256 actual block numbers in each indirect block) * (512 bytes of file contents) = 1,048,576 bytes = 1 MB.

That’s still not that big. To make the max filesize even bigger, Unix V6 uses the 8th block number of the inode to store a doubly indirect block number. In the inode, the first 7 block numbers point to singly-indirect blocks (as I’ve just described), but the last block number points to a block which itself stores singly-indirect block numbers. Therefore, the total number of singly indirect block numbers we can have is 7 + (256 numbers stored in the doubly indirect block) = 263, so the maximum filesize is (263 indirect blocks) * (256 actual block numbers in each indirect block) * (512 bytes of file contents) = 34,471,936 bytes = 34MB.

That’s still not that big, but keep in mind, Unix V6 is from 1975 :) Modern filesystems get even more fancy.

Takeaways

Filesystems are complicated!! This filesystem is from 1975, and it’s already pretty complicated. Modern filesystems are extremely complex, but they are all based on the same ideas that you see here.

Filesystem design also involves many tradeoffs. How many inodes should we reserve on the disk? If we reserve too few, that will limit the total number of files we can create, but if we reserve too many, we’ll be wasting space that could have been used to store the contents of files. When should we use direct addressing vs singly indirect or doubly indirect (or maybe even triply indirect) block numbers? Indirection increases the max file size, but adds complexity and slows down the filesystem (we need to read all those indirect blocks in order to get what we want, which is expensive). There is even tradeoffs between performance and implementation complexity: is it worth reclaiming some of the wasted space in our design if the filesystem becomes so complex we can’t implement it properly?

Modern filesystems strike a decent balance between speed, storage overhead, and usability factors (max filename length, max number of files, etc). However, filesystems are still evolving even today (especially as networked and distributed filesystems become more important), and sometimes we need to implement filesystems tailored to specific performance characteristics.

Evaluating tradeoffs is critical in systems design; rarely will there ever be a perfect solution. We’ll be thinking about tradeoffs much more for the rest of the class!