Lab 1 Solutions
This handout was adapted from Jerry Cain’s Spring 2018 offering.
The first and last exercises are problem set-esque questions that could easily appear on a midterm or final exam. In fact, all of the questions asked under Problem 3 were on previous midterms and finals. The middle problem is an experiment that’ll require you fire up your laptop and play with some programs.
I’ve created a Slack channel for Lab 1 discussion (aptly named #lab1), and all students (but particularly remote students) are encouraged to share their ideas there.
Problem 1: Direct, Singly Indirect, and Doubly Indirect Block Numbers
Assume blocks are 512 bytes in size, block numbers are four-byte int
s,
and that inodes include space for 6 block numbers. The first three contain
direct block numbers, the next two contain singly indirect block numbers, and
the final one contains a doubly indirect block number. (Note: This is
different from the scheme we’ve discussed in class.)
- What’s the maximum file size?
- Maximum file size is (3 direct block numbers) * 512 + (2 singly indirect numbers) * (512/4 numbers per block) * 512 + (512/4 singly indirect numbers per block) * (512/4 direct numbers per block) * 512, or 8,521,216 bytes.
- How large does a file need to be before the relevant inode requires the first
singly indirect block number be used?
- 3 * 512, per above.
- How large does a file need to be before the relevant inode requires the first
doubly indirect block number be used?
- 3 * 512 + 2 * (512 / 4) * 512, per above.
- Draw as detailed an inode as you can if it’s to represent a regular file
that’s 2049 bytes in size.
- The file occupies ceil(2049 / 512) = 5 blocks. The first three will be addressed in the first three slots of the inode; the other two can be addressed by a single singly-indirect block. In summary, the first three spaces in the inode point directly to file blocks, and the fourth space points to a singly direct block, which in turn points to two file blocks.
Problem 2: Experimenting with the stat
utility
This problem is more about exploration and experimentation, and not so much about generating a correct answer. The file system reachable from each myth machine consists of the local file system (restated, it’s mounted on the physical machine) and networked drives that are grafted onto the fringe of the local file system so that all of AFS – which consists of many, many independent file systems from around the globe – all contribute to one virtual file system reachable from your local / directory.
Log into myth53
and use the stat
command line utility (which is a user
program that makes calls to the stat
system call as part of its execution)
and prints out oodles of information about a file. Type in the following
commands and analyze the output:
stat /
stat /tmp
stat /usr
stat /usr/bin
stat /usr/bin/g++
stat /usr/bin/g++-5
The output for each of the five commands above all produce the same device ID
but different inode numbers. Read through
this
to gain insight on what the Device
values are.
Output for
stat /
includes a size of 4096, a block count of 8, and I/O block size of 4096, an inode number of 2, and a nlinks value of 26. It’s also clear it’s a directory.poohbear@myth53:/usr/class/cs110$ stat / File: '/' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 801h/2049d Inode: 2 Links: 26 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-04-10 21:35:15.878747216 -0700 Modify: 2017-11-29 16:59:25.456505089 -0800 Change: 2017-11-29 16:59:25.456505089 -0800 Birth: - poohbear@myth53:/usr/class/cs110$
- Directory sizes are always exposed as multiples of the true block size, which is 4096.
- The sector size is 512, and
stat
states that it uses 8 sectors to store all of its directory entries. (I have no idea whystat
uses blocks here instead of sectors; I just know it does). - The I/O block size just happens to be the same as the actual block size, but it’s listed separately, because it could have been different. The I/O block size is the optimal byte transfer size supported by the file system hardware.
The device information is expressed as a hexadecimal (801h) and a decimal equivalent (2049d). The 801 states that the major device number is 8, and then the minor device number (or partition) within that major device is 1. If you do an
ls -lt /dev/sda
, you get a listing within the pseudo-file system like that below, where/dev/sda1
is pseudofile representing the device driver that interfaces with the file system holding the root directory. (Special files like this are set up so that software can programmatically interact with device drivers as if they were files.)poohbear@myth53:/usr/class/cs110$ ls -lt /dev/sda* brw-rw---- 1 root disk 8, 5 Mar 16 17:38 /dev/sda5 brw-rw---- 1 root disk 8, 1 Mar 16 17:38 /dev/sda1 brw-rw---- 1 root disk 8, 2 Mar 16 17:38 /dev/sda2 brw-rw---- 1 root disk 8, 0 Mar 16 17:38 /dev/sda poohbear@myth53:/usr/class/cs110$
The inode number is 2, which shouldn’t be too surprising. (In this filesystem, inode 1 is for a file that keeps track of blocks that have gone bad on the physical disk.)
The number of links is 26, which means that there are 26 different names leading to the root directory’s payload: / is one, /. is another, and believe it or not, /.. is a third, since .. is the same as . for the root directory. If you
ls -lt /
, you see that there are 23 subdirectories, each of which has a .. directory entry. That’s the source of the 26.You can do similar listing for the other files as you get the same type of information.
For each of the above commands, replace stat
with stat -f
to get
information about the file system on which the file resides (block size, inode
table size, number of free blocks, number of free inodes, etc).
Output for
stat -f /
looks like this:poohbear@myth53:/usr/class/cs110$ stat -f / File: "/" ID: a36c23c85b903bf8 Namelen: 255 Type: ext2/ext3 Block size: 4096 Fundamental block size: 4096 Blocks: Total: 51833244 Free: 47344430 Available: 44705672 Inodes: Total: 13180928 Free: 12545954 poohbear@myth53:/usr/class/cs110$
ext2
andext3
are types of file systems, and the file systems on the myths are evidently a hybrid of the two.*namelen
is the maximum length of a filename component supported by the filesystem.- The fundamental block size is the block size we’ve discussed in lecture, and the block size (without the fundamental prefix) is a hint as to the optimal transfer size for I/O operations (and is generally the same as the I/O Block value discussed above).
Now log into myth54
and run the same commands. Why are the outputs of stat
and stat -f
the same in some cases and different in others?
- Each of
myth53
andmyth54
have independent file systems and might be populated slightly differently. However, theg++
andg++-5
executable images, while independent copies of the same file, are exactly that – independent copies of the same file – so all ofg++
’s andg++-5
’s file properties will be the same.
Now analyze the output of the stat
utility when levied against AFS mounts
where the master copies of all /usr/class
and /usr/class/cs110
files
reside. Do this from both myth53
and myth54
.
stat /usr/class
stat -f /usr/class
stat /afs/ir.stanford.edu/class
stat -f /afs/ir.stanford.edu/class
stat /usr/class/cs110
stat /afs/ir.stanford.edu/class/cs110
stat -f /usr/class/cs110
Why are most of the outputs the same for myth53
compared to myth54
? Which
ones are symbolic links? Why are the device numbers for remotely hosted file
systems so small?
- All but one of the outputs is the same, because they all refer to the same
files on remote file system that’s been grafted in to contribute to the
overall virtual file system that is AFS (or Andrew File System). (The output
for
stat /usr/class
is slightly different, because that symbolic link resides on themyth53
filesystem). This is what I get when I
stat /usr/class
frommyth13
:poohbear@myth53:/usr/class/cs110$ stat /usr/class File: '/usr/class' -> '/afs/ir.stanford.edu/class' Size: 26 Blocks: 0 IO Block: 4096 symbolic link Device: 801h/2049d Inode: 6339959 Links: 1 Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-04-11 07:00:03.833796571 -0700 Modify: 2017-11-29 17:06:42.578509590 -0800 Change: 2017-11-29 17:06:42.578509590 -0800 Birth: - poohbear@myth53:/usr/class/cs110$
- This is a symbolic link stored on
myth53
’s resident file system (note the801h
again). It’s clearly identified as a symbolic link (represented via inode with inumber 6339959). Its payload size is 26, which is the length of the/afs/ir.stanford.edu/class
the link expands to. The surprising part is that the payload requires 0 blocks! But that’s because symbolic link payloads are stored not in external blocks, but in the the space set aside for the inode’s block numbers array. Sneaky!
- This is a symbolic link stored on
Interestingly, the output of
stat
/usr/class/
(note that extra slash at the end) is different, because that extra slash dereferences the symbolic link! (Former CS110 CA Michael Painter noticed this and shared with me and the other CAs last autumn.)poohbear@myth53:/usr/class/cs110$ stat /usr/class/ File: '/usr/class/' Size: 309248 Blocks: 604 IO Block: 4096 directory Device: 28h/40d Inode: 262274 Links: 5 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-04-11 11:14:03.000000000 -0700 Modify: 2018-04-11 11:14:03.000000001 -0700 Change: 2018-04-11 11:14:03.000000000 -0700 Birth: - poohbear@myth53:/usr/class/cs110$
Why are the major device numbers so small here? AFS uses major number 0, hence the smaller device number. AFS doesn’t expose much about how remotely managed files are stored, which is why they went with a 0.
What about these commands?
stat /afs/northstar.dartmouth.edu
stat -f /afs/northstar.dartmouth.edu
stat /afs/asu.edu
stat -f /afs/asu.edu
What files can you see within the dartmouth.edu
and asu.edu
mounts?
- My most interesting finding with
stat
andstat -f
on those two remote directories is that the fundamental block sizes are 1024 instead of 2048. And I was able to see 20 directory entries within/afs/northstar.dartmouth.edu
and 67 directory entries within/afs/asu.edu
.
Problem 3: Short Answer Questions
Provide clear answers and/or illustrations for each of the short answer questions below. Each of these questions is either drawn from old exams or based on old exam questions. Questions like this will certainly appear on your own midterm.
- The
dup
system call accepts a valid file descriptor, claims a new, previously unused file descriptor, configures that new descriptor to alias the same file session as the incoming one, and then returns it. Briefly outline what happens to the relevant file entry table and vnode table entries as a result ofdup
being called. (Readman dup
if you’d like, though don’t worry about error scenarios).- The vnode table entry is left alone, but a new file descriptor is claimed and set to address the same entry in the file entry table session as the incoming one, and the reference count within that session entry would be incremented by one.
- Now consider the prototype for the
link
system call (peruseman link
). A successful call tolink
updates the file system so the file identified byoldpath
is also identified bynewpath
. Oncelink
returns, it’s impossible to tell which name was created first. (To be clear,newpath
isn’t just a symbolic link, since it could eventually be the only name for the file.) In the context of the file system discussed in lecture and/or the file system discussed in Section 2.5 of the secondary textbook, explain how link might be implemented.- Resolve
oldpath
to its inode number, append new directory entry to sequence of existing directory entries wherenewpath
resides. Fill in entry with last component ofnewpath
, and fill in inumber with the inumber of old path. Finally, increment the reference count within the inode itself to be clear the file has one more name.
- Resolve
- Explain what happens when you type
cd .././../.
at the shell prompt. Frame your explanation in terms of our discussion of directories in lecture, and the fact that the inode number of the current working directory is the only relevant global variable maintained by your shell.- Search cwd’s payload for .., set inumber of cwd to inumber associated with ..; repeat three more times for ., then .., and then . :)
- All modern file systems allow symbolic links to exist as shortcuts for
longer absolute and relative paths (e.g.
sanitycheck
might be a symbolic link for/afs/.ir/users/r/e/rebs/cs110/assign1/tools/sanitycheck
, andtests.txt
might be a symbolic link for./mytests/tests.txt
. Explain how your the absolute pathname resolution process we discussed in lecture would need to change to resolve absolute pathnames to inode numbers when some of the pathname components might be symbolic links.- The absolute pathname resolution process (implemented as
pathname_lookup
in assign2) should piecemeal tokenize pathnames as usual. When component (e.g.tests.txt
) is identified as symlink, prepend expansion to unprocessed set of tokens, continue for relative paths from inumber of symlink parent, restart from inode 1 for absolute paths.
- The absolute pathname resolution process (implemented as