Lecture 2: Filesystem Recap, Intro to System Calls

Note: Reading these lecture notes is not a substitute for watching the lecture. I frequently go off script, and you are responsible for understanding everything I talk about in lecture unless I specify otherwise.

Directory details: . and ..

In Unix, there are some features of pathnames that you may have seen before: a leading / refers to the root directory, a leading ~ refers to your home directory, . refers to the current directory, and .. refers to the parent directory. (As an example, ~/./Desktop/../ refers to your home directory.)

As it turns out, . and .. are actually implemented as features of the filesystem. Every directory has at least two entries: an entry mapping . to the directory’s own inumber, and an entry mapping .. to the parent directory’s inumber.

Even the root directory has such entries. In the root directory’s case, however, both . and .. resolve to inumber 1 (i.e. the parent of the root directory is still the root directory).

Filesystems also generally support links, which are references to other files in the filesystem.

We have been talking about hard links throughout our entire discussion of directories, though you may not have known it. A hard link is simply an entry in a directory file. (We’ve been using the term “directory entry” and will continue to do so, but hard links are functionally the same thing.) Every file has at least one hard link pointing to it (i.e. the link in its parent directory). Every directory has at least two hard links pointing to it: the link in its parent directory, and its own . directory entry.

While hard links map filenames to inumbers, soft links map filenames to other filenames. (Soft links are also called symbolic links, because they resolve to symbolic names instead of numbers.) Just like directories are just files, symbolic links are also just files. When a symbolic link is created, a new file is created, but instead of having type “regular file” or “directory,” it has type “link.” The contents of this file is the path to the file it links to.

Layers of Abstraction in Filesystems

The Unix V6 filesystem comes from the 1970s, yet, as you can see, there is already a large amount of complexity. One common paradigm for dealing with complexity is layering. I explained filesystem layering in the Assignment 1 handout, but I think it is worth repeating here:

On top of these 6 layers sit many application layers that use the filesystem without having to think about how it works.

Not only does layering provide us with a means of breaking down complexity, but it also has some nice properties if we ever want to modify the system to do something new. Let’s say we want to create a networked filesystem. Instead of having to write it from scratch, we can keep everything except for the hardware and block layers, replacing those with some layers that deal with network communication.

Unix employs this principle everywhere. As you will see later in the course, many resources are made to look like files (even though they aren’t files) so that we can control them using the file abstractions we’ve developed. Your computer interacts with your terminal window, printer, Bluetooth radio, and even (to a certain extent) CPU as if they were files, even though that is certainly not the case. As we will see towards the end of the class, the layering principle is equally pervasive in the land of networking.

System Calls

There is a reason you have probably never written code that manages raw sectors on disk: you can’t. It would be dangerous if you could; you might unintentionally corrupt some critical sectors, rendering your entire filesystem unusable. Worse, malicious code could access or alter data that it isn’t supposed to have access to. For example, unprivileged code isn’t allowed to read or modify the /etc/passwd or /etc/shadow files (which store information about passwords on your system), but if a program were allowed to access the raw filesystem, it could circumvent those permission checks.

We need the operating system to perform these privileged operations on our behalf, and to mediate access to the filesystem so that it can block malicious behavior. We interact with the operating system, asking it to do privileged operations on our behalf, through functions known as system calls (syscalls for short).

printf might seem like some magical core function to you, but it’s actually built on top of syscalls in ways that you can easily understand now. I can write a “hello world” program without using printf:

int main(int argc, char *argv[]) {
    char* output = "Hello world\n";
    write(STDOUT_FILENO, output, 12);
    return 0;
}

(Note: 12 is the length of the output string.)

Implementing cp

cp (which is used to copy files from the terminal) might seem like a magical command, but it is just a program, written using primitives that you can understand. The basic approach is to read some bytes from one file, write those bytes to the output file, and repeat.

int main(int argc, char *argv[]) {
    assert(argc == 3);
    const char *infile = argv[1];
    const char *outfile = argv[2];
    int infd = open(infile, O_RDONLY);
    int outfd = open(outfile, O_WRONLY | O_CREAT | O_EXCL, 0664);

    while (true) {
        char buffer[1024];
        ssize_t count = read(infd, buffer, sizeof(buffer));
        if (count == 0) break;

        // In a loop, try writing the bytes we read out to the output file,
        // until all of them get written
        size_t numWritten = 0;
        size_t numToWrite = count;
        while (numWritten < numToWrite) {
            numWritten += write(outfd, buffer + numWritten, count - numWritten);
        }
    }

    close(infd);
    close(outfd);
    return 0;
}

Implementing find

The find program searches a directory for files with a matching name. For example, to find instances of stdio.h:

find /usr/include -name stdio.h

To implement this, we need a new syscall:

int stat(const char *pathname, struct stat *buf);

This populates a struct stat. At minimum, a stat struct will have the following fields, populated almost directly from the target file’s inode:

dev_t     st_dev     ID of device containing file
ino_t     st_ino     file serial number
mode_t    st_mode    mode of file
nlink_t   st_nlink   number of links to the file
uid_t     st_uid     user ID of file
gid_t     st_gid     group ID of file
dev_t     st_rdev    device ID (if file is character or block special)
off_t     st_size    file size in bytes (if file is a regular file)
time_t    st_atime   time of last access
time_t    st_mtime   time of last data modification
time_t    st_ctime   time of last status change
blksize_t st_blksize a filesystem-specific preferred I/O block size for
                     this object.  In some filesystem types, this may
                     vary from file to file
blkcnt_t  st_blocks  number of blocks allocated for this object

The st_mode field is of particular interest to us; it is a bit set containing (among other things) information about whether this is a regular file/directory/symbolic link. We can extract that information using the S_ISDIR, S_ISREG, and S_ISLINK macros.

static void listMatches(char *path, size_t length, const char *pattern) {
    DIR *dir = opendir(path);
    if (dir == NULL) return;
    strcpy(path + length, "/");
    length++;
    while (true) {
        struct dirent *de = readdir(dir);
        if (de == NULL) break;
        if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0) continue;

        strcpy(path + length, de->d_name);
        struct stat st;
        lstat(path, &st);
        if (S_ISREG(st.st_mode)) {
            if (strcmp(de->d_name, pattern) == 0) printf("%s\n", path);
        } else if (S_ISDIR(st.st_mode)) {
            listMatches(path, length + strlen(de->d_name), pattern);
        }
    }
    closedir(dir);
}

int main(int argc, char *argv[]) {
    assert(argc == 3);
    const char *directory = argv[1];

    struct stat st;
    stat(directory, &st);
    assert(S_ISDIR(st.st_mode));

    char *pattern = argv[2];
    char path[4096];
    strcpy(path, directory);
    listMatches(path, strlen(path), pattern);
    return 0;
}

Implementing ls

In addition to telling us what type a file is, st_mode also stores permissioning information. We can use this to reconstruct the permission strings that appear in the leftmost of ls -l output.

static const char kFlags[] = {'r', 'w', 'x'};
static const mode_t kMasks[] = {
    S_IRUSR, S_IWUSR, S_IXUSR,
    S_IRGRP, S_IWGRP, S_IXGRP,
    S_IROTH, S_IWOTH, S_IXOTH,
};

static void updatePermissionBit(char buffer[], int pos, char ch, bool flag) {
    if (!flag) return;
    buffer[pos] = ch;
}

static void printPermissions(mode_t m) {
    char buffer[11]; // 10 + 1 = 1 + 3 * 3 + 1
    memset(buffer, '-', 11);
    buffer[10] = '\0';
    updatePermissionBit(buffer, 0, 'd', S_ISDIR(m));
    updatePermissionBit(buffer, 0, 'l', S_ISLNK(m));
    for (size_t i = 0; i < 9; i++) {
        updatePermissionBit(buffer, i + 1, kFlags[i % 3],
                m & kMasks[i]);
    }
    printf("%s ", buffer);
}

static void printName(const char *name, const struct stat *st, bool link, const char *path) {
  printf("%s", name);
  if (!link) return;
  char target[st->st_size + 1];
  readlink(path, target, sizeof(target));
  target[st->st_size] = '\0'; // readlink doesn't put down '\0' char, drop it in ourselves
  printf(" -> %s", target);
}

static void listEntry(const char *name, const struct stat *st, bool link, const char *path) {
  printPermissions(st->st_mode);
  printName(name, st, link, path);
  printf("\n");
}

static void listDirectory(const char *name, size_t length, const struct stat *st) {
  char path[2048];
  strcpy(path, name);
  DIR *dir = opendir(path);
  strcpy(path + length, "/");
  while (true) {
    struct dirent *de = readdir(dir);
    if (de == NULL) break;
    if (de->d_name[0] == '.') continue;
    strcpy(path + length + 1, de->d_name);
    struct stat st;
    lstat(path, &st);
    listEntry(de->d_name, &st, S_ISLNK(st.st_mode), path);
  }
  closedir(dir);
}

int main(int argc, char *argv[]) {
    struct stat st;
    const char* dir = ".";
    lstat(dir, &st);
    if (S_ISREG(st.st_mode) || S_ISLNK(st.st_mode)) {
        listEntry(dir, &st, S_ISLNK(st.st_mode), dir);
    } else if (S_ISDIR(st.st_mode)) {
        listDirectory(dir, strlen(dir), &st);
    }
    return 0;
}

The vnode, file entry, and file descriptor tables

When we open a file and get a file descriptor back, how does the operating system manage that file descriptor in association with the file we’re trying to interact with? (This is more than incidental curiosity; this material will dictate much of our discussion on interprocess communication next week.)