Lecture 2: Introduction to Filesystems

Note: Reading these lecture notes is not a substitute for watching the lecture. I frequently go off script, and you are responsible for understanding everything I talk about in lecture unless I specify otherwise.

A software view of hard drives

Just like RAM, hard drives provide us with some contiguous space that we can store information in. Information in RAM is byte-addressable: even if you’re only trying to store a boolean (1 bit), you need to read an entire byte (8 bits) to retrieve that boolean from memory, and if you want to flip the boolean, you need to write the entire byte back to memory. A similar concept exists in the world of hard drives. Hard drives are divided into sectors (often 512 bytes large), and are sector-addressable: you must read or write entire 512-byte sectors, even if you’re only interested in 32 bytes of information within a sector.

Sidenote: sectors are most commonly 512 bytes large, but this can vary. The size is determined by the physical hard drive (so it’s possible to have a computer with several hard drives, each using a different sector size).

Conceptually, a hard drive divided into sectors might look like this:

This is what the hardware presents us with, and this small amount is all you really need to know in order to start designing basic filesystems. As filesystem designers, we need to figure out a way to take this primitive system and use it to store a user’s files.

Note: throughout the rest of the course, you may hear me use the term “block” instead of “sector.” Sectors are the physical base units on the hard drive, but the operating system might elect to do all its filesystem operations in terms of blocks (which are each comprised of several sectors). If an operating system chooses a block size of 1024 (spanning two 512-byte sectors), then when it accesses or modifies the filesystem, it will only read/write from the disk in 1024-byte chunks (despite the fact that the disk is capable of operating in 512-byte chunks). This has performance benefits which we won’t talk about (take CS 140 if you’re interested!). In this class, “blocks” and “sectors” are interchangeable.

Unix V6 filesystem concepts

Note: if you can, you should really watch this part of lecture, as I talk through the motivating factors that drove this design.

This diagram shows the basic design of the Unix V6 filesystem. There is a lot to unpack, so I will explain below:

General layout

At the beginning of the disk, a series of blocks is reserved for filesystem metadata. Metadata includes information such as the number of free blocks available, information for figuring out which files are stored where, etc. (Incidentally, this is the reason that when you buy a new 16GB flash drive and plug it into your computer, your computer reports that it has less than 16GB of free space.) The remaining blocks are used to store actual file contents. (The diagram is a little misleading; the portion used to store file contents is far larger than the portion used to store metadata.)

Files are split into block-sized chunks (i.e. usually 512-byte chunks) when they are stored. If a file doesn’t fit evenly in 512-byte chunks, then oh well, we waste some space. (This is the case with the blue file in the diagram. Its filesize is only 32 bytes, yet an entire 512-byte block is dedicated to storing it.)

Inodes

Somehow, we need to keep track of which blocks are being used to store the pieces of a file. (Blocks 1025, 1027, and 1028 contain the file contents for the same file, but if I hadn’t color-coded the diagram, you wouldn’t have known that.) This is the purpose of inodes. Inodes (“information nodes” – fancy term for a thing storing bookkeeping info) are 32-byte (or sometimes 64-byte) structures that store information about files on disk. As you will see in assignment 1, lots of different info is included in an inode, such as the owner of the file, permissions information (is anyone on the machine allowed to edit this?), the time the file was last modified, and more. Additionally, most relevant to this discussion, it stores the file size, file type (regular file or directory – more on that in a moment), and an array of block numbers identifying the blocks storing this file’s contents.

In the diagram, take a look at the contents of Inode 2, outlined with a dashed green line. The file size is 1028, so automatically, we know that this file must span 3 blocks (if the block size is 512). The block numbers are listed as 1027, 1028, and 1025, so if we are trying to read bytes 0-511 from the file, we look in block 1027 for that info; if we are trying to read bytes 512-1023 from the file, we look in block 1028; and if we are trying to read bytes 1024-1027 from the file, we look in block 1025.

Note: The blocks used to store file contents are not necessarily contiguous or in order! In the case of inode 2, block 1025 is used to store the last part of the file, even though this is physically located after block 1027 (which stores the first part of the file).

Note: Block 1025 is only being used to store four bytes of the green file. 508 bytes in the block are being wasted. This is just how the filesystem is designed; if we wanted to reclaim that space, we would have to do some really fancy things that drastically increase the complexity of the filesystem. (Some modern filesystems do implement schemes like that, but most don’t because it isn’t worth the added complexity.) As a thought experiment, think about how your bookkeeping information would have to change if you wanted to be able to use fragments of blocks for storing multiple files.

The inodes tell us where to find a file’s contents, but we also need to store the inodes themselves on disk. A series of blocks is reserved for this, starting from block #2. Because inodes have a fixed size of 32 bytes, we can fit several of them in each block, which is what we do. For illustration purposes, I only drew 4 inodes per bloc in the diagram above. In reality, if the inodes are 32 bytes large (as they are in the Unix V6 filesystem) and the sector size is 512 bytes, there will be 512/32 = 16 inodes per sector.

Note: Inodes are indexed starting from 1. This has some historical reasons (which you don’t need to know). For the curious, inode-related functions in the operating system return 0 when there is an error (because they are returning unsigned values, so they can’t return -1). Therefore, we reserve inumber 0.

Large files

In the Unix V6 filesystem, inodes can only store a maximum of 8 block numbers. This limits the total file size to 8 * 512 = 4096 bytes. That’s not very big.

To resolve this problem, we use a scheme called indirect addressing. Normally, the inode stores the block numbers that form a file; as an example, let’s say the file is stored across blocks 2001-2008. The inode will store the numbers 2001-2008. We want to append to the file, but the inode can’t store any more block numbers. Instead, let’s allocate a block on disk (where the file contents are) – let’s say this is block 2050 – and let’s store the numbers 2001-2009 in that block. Then, we update the inode to store only block number 2050, and we set a flag specifying that we’re using this large-addressing scheme. When we want to get the contents of the file, we check the inode and see the “big file” flag. We get the first block number, read that block, and then read the actual block numbers (storing file contents) from that block.

This is known as singly-indirect addressing. We can store up to 8 singly-indirect block numbers in an inode, and each of those blocks can store 512 / 2 = 256 block numbers (block numbers are shorts, which are 2 bytes). Therefore, this increases the maximum file size to (8 singly indirect block numbers) * (256 actual block numbers in each indirect block) * (512 bytes of file contents) = 1,048,576 bytes = 1 MB.

That’s still not that big. To make the max filesize even bigger, Unix V6 uses the 8th block number of the inode to store a doubly indirect block number. In the inode, the first 7 block numbers point to singly-indirect blocks (as I’ve just described), but the last block number points to a block which itself stores singly-indirect block numbers. Therefore, the total number of singly indirect block numbers we can have is 7 + (256 numbers stored in the doubly indirect block) = 263, so the maximum filesize is (263 indirect blocks) * (256 actual block numbers in each indirect block) * (512 bytes of file contents) = 34,471,936 bytes = 34MB.

That’s still not that big, but keep in mind, Unix V6 is from 1975 :) Modern filesystems get even more fancy.

Takeaways

Filesystems are complicated!! This filesystem is from 1975, and it’s already pretty complicated. Modern filesystems are extremely complex, but they are all based on the same ideas that you see here.

Filesystem design also involves many tradeoffs. How many inodes should we reserve on the disk? If we reserve too few, that will limit the total number of files we can create, but if we reserve too many, we’ll be wasting space that could have been used to store the contents of files. When should we use direct addressing vs singly indirect or doubly indirect (or maybe even triply indirect) block numbers? Indirection increases the max file size, but adds complexity and slows down the filesystem (we need to read all those indirect blocks in order to get what we want, which is expensive). There is even tradeoffs between performance and implementation complexity: is it worth reclaiming some of the wasted space in our design if the filesystem becomes so complex we can’t implement it properly?

Modern filesystems strike a decent balance between speed, storage overhead, and usability factors (max filename length, max number of files, etc). However, filesystems are still evolving even today (especially as networked and distributed filesystems become more important), and sometimes we need to implement filesystems tailored to specific performance characteristics.

Evaluating tradeoffs is critical in systems design; rarely will there ever be a perfect solution. We’ll be thinking about tradeoffs much more for the rest of the class!