Lecture 5: Processes

Note: Reading these lecture notes is not a substitute for watching the lecture. I frequently go off script, and you are responsible for understanding everything I talk about in lecture unless I specify otherwise.

System calls

Recall from last lecture: System calls are functions that ask the kernel to do something for us that we wouldn’t be able to do otherwise, no matter how clever we are (since ordinary programs run with less permissions than the kernel).

I/O syscalls

Implementing cp

cp (which is used to copy files from the terminal) might seem like a magical command, but it is just a program, written using primitives that you can understand. The basic approach is to read some bytes from one file, write those bytes to the output file, and repeat.

int main(int argc, char *argv[]) {
    assert(argc == 3);
    const char *infile = argv[1];
    const char *outfile = argv[2];
    int infd = open(infile, O_RDONLY);
    int outfd = open(outfile, O_WRONLY | O_CREAT | O_EXCL, 0664);

    while (true) {
        char buffer[1024];
        ssize_t count = read(infd, buffer, sizeof(buffer));
        if (count == 0) break;

        // In a loop, try writing the bytes we read out to the output file,
        // until all of them get written
        size_t numWritten = 0;
        size_t numToWrite = count;
        while (numWritten < numToWrite) {
            numWritten += write(outfd, buffer + numWritten, count - numWritten);
        }
    }

    close(infd);
    close(outfd);
    return 0;
}

Permissions

File permissions are usually represented using 3 octal digits.

The value of each digit is determined as the sum of the following:

For example, 7 (4 + 2 + 1) means “read, write, or execute”, 5 (4 + 1) means “read or execute”, and 4 means “read only.”

We store 3 separate digits with a leading 0, indicating the permissions for the user owning the file, the group owning the file, and everyone else.

For example, 0755 means the owner can read/write/execute, but everyone else can only read or execute. 0640 means the owner can read/write, other users in the owning group can only read, and everyone else has no access to the file.

Optional: implementing find

The find program searches a directory for files with a matching name. For example, to find instances of stdio.h:

find /usr/include -name stdio.h

To implement this, we need a new syscall:

int stat(const char *pathname, struct stat *buf);

This populates a struct stat. At minimum, a stat struct will have the following fields, populated almost directly from the target file’s inode:

dev_t     st_dev     ID of device containing file
ino_t     st_ino     file serial number
mode_t    st_mode    mode of file
nlink_t   st_nlink   number of links to the file
uid_t     st_uid     user ID of file
gid_t     st_gid     group ID of file
dev_t     st_rdev    device ID (if file is character or block special)
off_t     st_size    file size in bytes (if file is a regular file)
time_t    st_atime   time of last access
time_t    st_mtime   time of last data modification
time_t    st_ctime   time of last status change
blksize_t st_blksize a filesystem-specific preferred I/O block size for
                     this object.  In some filesystem types, this may
                     vary from file to file
blkcnt_t  st_blocks  number of blocks allocated for this object

The st_mode field is of particular interest to us; it is a bit set containing (among other things) information about whether this is a regular file/directory/symbolic link. We can extract that information using the S_ISDIR, S_ISREG, and S_ISLINK macros.

static void listMatches(char *path, size_t length, const char *pattern) {
    DIR *dir = opendir(path);
    if (dir == NULL) return;
    strcpy(path + length, "/");
    length++;
    while (true) {
        struct dirent *de = readdir(dir);
        if (de == NULL) break;
        if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0) continue;

        strcpy(path + length, de->d_name);
        struct stat st;
        lstat(path, &st);
        if (S_ISREG(st.st_mode)) {
            if (strcmp(de->d_name, pattern) == 0) printf("%s\n", path);
        } else if (S_ISDIR(st.st_mode)) {
            listMatches(path, length + strlen(de->d_name), pattern);
        }
    }
    closedir(dir);
}

int main(int argc, char *argv[]) {
    assert(argc == 3);
    const char *directory = argv[1];

    struct stat st;
    stat(directory, &st);
    assert(S_ISDIR(st.st_mode));

    char *pattern = argv[2];
    char path[4096];
    strcpy(path, directory);
    listMatches(path, strlen(path), pattern);
    return 0;
}

Optional: Implementing ls

In addition to telling us what type a file is, st_mode also stores permissioning information. We can use this to reconstruct the permission strings that appear in the leftmost of ls -l output.

static const char kFlags[] = {'r', 'w', 'x'};
static const mode_t kMasks[] = {
    S_IRUSR, S_IWUSR, S_IXUSR,
    S_IRGRP, S_IWGRP, S_IXGRP,
    S_IROTH, S_IWOTH, S_IXOTH,
};

static void updatePermissionBit(char buffer[], int pos, char ch, bool flag) {
    if (!flag) return;
    buffer[pos] = ch;
}

static void printPermissions(mode_t m) {
    char buffer[11]; // 10 + 1 = 1 + 3 * 3 + 1
    memset(buffer, '-', 11);
    buffer[10] = '\0';
    updatePermissionBit(buffer, 0, 'd', S_ISDIR(m));
    updatePermissionBit(buffer, 0, 'l', S_ISLNK(m));
    for (size_t i = 0; i < 9; i++) {
        updatePermissionBit(buffer, i + 1, kFlags[i % 3],
                m & kMasks[i]);
    }
    printf("%s ", buffer);
}

static void printName(const char *name, const struct stat *st, bool link, const char *path) {
  printf("%s", name);
  if (!link) return;
  char target[st->st_size + 1];
  readlink(path, target, sizeof(target));
  target[st->st_size] = '\0'; // readlink doesn't put down '\0' char, drop it in ourselves
  printf(" -> %s", target);
}

static void listEntry(const char *name, const struct stat *st, bool link, const char *path) {
  printPermissions(st->st_mode);
  printName(name, st, link, path);
  printf("\n");
}

static void listDirectory(const char *name, size_t length, const struct stat *st) {
  char path[2048];
  strcpy(path, name);
  DIR *dir = opendir(path);
  strcpy(path + length, "/");
  while (true) {
    struct dirent *de = readdir(dir);
    if (de == NULL) break;
    if (de->d_name[0] == '.') continue;
    strcpy(path + length + 1, de->d_name);
    struct stat st;
    lstat(path, &st);
    listEntry(de->d_name, &st, S_ISLNK(st.st_mode), path);
  }
  closedir(dir);
}

int main(int argc, char *argv[]) {
    struct stat st;
    const char* dir = ".";
    lstat(dir, &st);
    if (S_ISREG(st.st_mode) || S_ISLNK(st.st_mode)) {
        listEntry(dir, &st, S_ISLNK(st.st_mode), dir);
    } else if (S_ISDIR(st.st_mode)) {
        listDirectory(dir, strlen(dir), &st);
    }
    return 0;
}

Error handling

System calls can fail for a number of reasons; for example, open can fail if the file you’re trying to open doesn’t exist, or if you don’t have the necessary permissions, or for many other reasons. The failure modes are generally documented in the “Errors” section of each syscall’s man page (e.g. open).

If a system call fails, it typically returns -1 to indicate failure, and sets a global errno variable with a constant number that identifies the cause of failure. perror is a handy function that reads the value in errno and prints out the cause of failure.

Every time you call a system call, you should handle potential failure:

int fd = open(...);
if (fd == -1) {
    perror("Failed to open file");
    exit(1);
}

Example output:

Failed to open file: No such file or directory

Key takeaways for system calls

Interlude: Games and anti-cheat software

A year ago, Riot Games (developer of Leage of Legends) announced this controversial blog post announcing a new anti-cheat system.

Cheating is a big problem in competitive games, and over the past many years, people have developed cheat software that runs inside the kernel in order to feed game programs fake data. For example, the game needs to request the mouse position via system calls, and the cheat software can intercept those syscalls and return a fake mouse position. If the game asks, “what other programs are running on this computer right now?,” the cheat software can intercept that system call and return with the list of programs minus itself, in order to conceal itself.

From the Riot blog post:

In the last few years, cheat developers have started to leverage vulnerabilities or corrupt Windows’ signing verification to run their applications (or portions of them) at the kernel level. The problem here arises from the fact that code executing in kernel-mode can hook the very system calls we would rely on to retrieve our data, modifying the results to appear legitimate in a way we might have difficulty detecting.

Game developers are not happy about this, and have decided to install their own component inside the kernel so that they can detect other kernel changes made for cheating. In order to install the game, you must also install an additional component inside the kernel that will watch out for any funny business tampering with syscalls.

This is quite controversial. On one hand, it seems necessary, because if the game is just living in a “box” (process) and the cheat software + OS are controlling all of its interactions with the outside world, then there is no possible way that the game can detect and ban all cheat software. On the other hand, the kernel is a sacred place, and we shouldn’t accept installation of code there “just because.” Kernel code must have an extremely high bar for quality, since it controls the entire system, and unfortunately companies do not usually produce such high quality code. Symantec, an antivirus company, uses a kernel module to implement antivirus scanning, but their kernel module was buggy and allowed a virus to hijack the kernel.