Lecture 5: Processes
Note: Reading these lecture notes is not a substitute for watching the lecture. I frequently go off script, and you are responsible for understanding everything I talk about in lecture unless I specify otherwise.
System calls
Recall from last lecture: System calls are functions that ask the kernel to do something for us that we wouldn’t be able to do otherwise, no matter how clever we are (since ordinary programs run with less permissions than the kernel).
I/O syscalls
int open (const char *filename, int flags, ...)
- This tells the OS, “hey, I would like to work with this file.”
flags
tellsopen
how we’d like to interact with the file. (Are we reading or writing? If writing, what do we do if the file already exists? Etc)- If we are writing to a file that doesn’t already exist and we wish to create it, a third argument can be used to specify the new file’s permissions.
- This function returns a number, which is a file descriptor. This fd can be passed to other filesystem-related syscalls to work with the file we’ve just opened.
ssize_t read(int fd, void* buffer, size_t count)
- Given a file descriptor (returned by
open
), attempt to readcount
bytes from the file intobuffer
- Returns the number of bytes actually read from the file.
- Given a file descriptor (returned by
ssize_t write(int descriptor, void *buf, size_t count)
- Given a file descriptor, attempt to write
count
bytes frombuf
to the file - Returns the number of bytes successfully written to the file.
- Given a file descriptor, attempt to write
int close(int descriptor)
- Tell the operating system we’re done using a particular file. Frees resources that the operating system was using to keep track of the file
Implementing cp
cp
(which is used to copy files from the terminal) might seem like a magical
command, but it is just a program, written using primitives that you can
understand. The basic approach is to read some bytes from one file, write those
bytes to the output file, and repeat.
int main(int argc, char *argv[]) {
assert(argc == 3);
const char *infile = argv[1];
const char *outfile = argv[2];
int infd = open(infile, O_RDONLY);
int outfd = open(outfile, O_WRONLY | O_CREAT | O_EXCL, 0664);
while (true) {
char buffer[1024];
ssize_t count = read(infd, buffer, sizeof(buffer));
if (count == 0) break;
// In a loop, try writing the bytes we read out to the output file,
// until all of them get written
size_t numWritten = 0;
size_t numToWrite = count;
while (numWritten < numToWrite) {
numWritten += write(outfd, buffer + numWritten, count - numWritten);
}
}
close(infd);
close(outfd);
return 0;
}
Permissions
File permissions are usually represented using 3 octal digits.
The value of each digit is determined as the sum of the following:
4
= read2
= write1
= execute
For example, 7 (4 + 2 + 1) means “read, write, or execute”, 5 (4 + 1) means “read or execute”, and 4 means “read only.”
We store 3 separate digits with a leading 0
, indicating the permissions for
the user owning the file, the group owning the file, and everyone else.
For example, 0755
means the owner can read/write/execute, but everyone else
can only read or execute. 0640
means the owner can read/write, other users in
the owning group can only read, and everyone else has no access to the file.
Optional: implementing find
The find
program searches a directory for files with a matching name. For
example, to find instances of stdio.h
:
find /usr/include -name stdio.h
To implement this, we need a new syscall:
int stat(const char *pathname, struct stat *buf);
This populates a struct stat
. At minimum, a stat
struct will have the
following fields, populated almost directly from the target file’s inode:
dev_t st_dev ID of device containing file
ino_t st_ino file serial number
mode_t st_mode mode of file
nlink_t st_nlink number of links to the file
uid_t st_uid user ID of file
gid_t st_gid group ID of file
dev_t st_rdev device ID (if file is character or block special)
off_t st_size file size in bytes (if file is a regular file)
time_t st_atime time of last access
time_t st_mtime time of last data modification
time_t st_ctime time of last status change
blksize_t st_blksize a filesystem-specific preferred I/O block size for
this object. In some filesystem types, this may
vary from file to file
blkcnt_t st_blocks number of blocks allocated for this object
The st_mode
field is of particular interest to us; it is a bit set containing
(among other things) information about whether this is a regular
file/directory/symbolic link. We can extract that information using the
S_ISDIR
, S_ISREG
, and S_ISLINK
macros.
static void listMatches(char *path, size_t length, const char *pattern) {
DIR *dir = opendir(path);
if (dir == NULL) return;
strcpy(path + length, "/");
length++;
while (true) {
struct dirent *de = readdir(dir);
if (de == NULL) break;
if (strcmp(de->d_name, ".") == 0 || strcmp(de->d_name, "..") == 0) continue;
strcpy(path + length, de->d_name);
struct stat st;
lstat(path, &st);
if (S_ISREG(st.st_mode)) {
if (strcmp(de->d_name, pattern) == 0) printf("%s\n", path);
} else if (S_ISDIR(st.st_mode)) {
listMatches(path, length + strlen(de->d_name), pattern);
}
}
closedir(dir);
}
int main(int argc, char *argv[]) {
assert(argc == 3);
const char *directory = argv[1];
struct stat st;
stat(directory, &st);
assert(S_ISDIR(st.st_mode));
char *pattern = argv[2];
char path[4096];
strcpy(path, directory);
listMatches(path, strlen(path), pattern);
return 0;
}
Optional: Implementing ls
In addition to telling us what type a file is, st_mode
also stores
permissioning information. We can use this to reconstruct the permission
strings that appear in the leftmost of ls -l
output.
static const char kFlags[] = {'r', 'w', 'x'};
static const mode_t kMasks[] = {
S_IRUSR, S_IWUSR, S_IXUSR,
S_IRGRP, S_IWGRP, S_IXGRP,
S_IROTH, S_IWOTH, S_IXOTH,
};
static void updatePermissionBit(char buffer[], int pos, char ch, bool flag) {
if (!flag) return;
buffer[pos] = ch;
}
static void printPermissions(mode_t m) {
char buffer[11]; // 10 + 1 = 1 + 3 * 3 + 1
memset(buffer, '-', 11);
buffer[10] = '\0';
updatePermissionBit(buffer, 0, 'd', S_ISDIR(m));
updatePermissionBit(buffer, 0, 'l', S_ISLNK(m));
for (size_t i = 0; i < 9; i++) {
updatePermissionBit(buffer, i + 1, kFlags[i % 3],
m & kMasks[i]);
}
printf("%s ", buffer);
}
static void printName(const char *name, const struct stat *st, bool link, const char *path) {
printf("%s", name);
if (!link) return;
char target[st->st_size + 1];
readlink(path, target, sizeof(target));
target[st->st_size] = '\0'; // readlink doesn't put down '\0' char, drop it in ourselves
printf(" -> %s", target);
}
static void listEntry(const char *name, const struct stat *st, bool link, const char *path) {
printPermissions(st->st_mode);
printName(name, st, link, path);
printf("\n");
}
static void listDirectory(const char *name, size_t length, const struct stat *st) {
char path[2048];
strcpy(path, name);
DIR *dir = opendir(path);
strcpy(path + length, "/");
while (true) {
struct dirent *de = readdir(dir);
if (de == NULL) break;
if (de->d_name[0] == '.') continue;
strcpy(path + length + 1, de->d_name);
struct stat st;
lstat(path, &st);
listEntry(de->d_name, &st, S_ISLNK(st.st_mode), path);
}
closedir(dir);
}
int main(int argc, char *argv[]) {
struct stat st;
const char* dir = ".";
lstat(dir, &st);
if (S_ISREG(st.st_mode) || S_ISLNK(st.st_mode)) {
listEntry(dir, &st, S_ISLNK(st.st_mode), dir);
} else if (S_ISDIR(st.st_mode)) {
listDirectory(dir, strlen(dir), &st);
}
return 0;
}
Error handling
System calls can fail for a number of reasons; for example, open
can fail if
the file you’re trying to open doesn’t exist, or if you don’t have the
necessary permissions, or for many other reasons. The failure modes are
generally documented in the “Errors” section of each syscall’s man page (e.g.
open
).
If a system call fails, it typically returns -1
to indicate failure, and sets
a global errno
variable with a constant number that identifies the cause of
failure. perror
is a handy function that reads the value in errno
and
prints out the cause of failure.
Every time you call a system call, you should handle potential failure:
int fd = open(...);
if (fd == -1) {
perror("Failed to open file");
exit(1);
}
Example output:
Failed to open file: No such file or directory
Key takeaways for system calls
- System calls are how we interact with the operating system to do things that we can’t do ourselves.
open
(which returns a file descriptor, basically a pointer to a file session) is likemalloc
(which returns a pointer to memory), andclose
is likefree
. Everyopen
call should always be paired with aclose
call. Forgetting toclose
won’t necessarily be a big problem, but it’s considered a resource leak and can become problematic over time.- Any time you call a syscall, check for errors!
Interlude: Games and anti-cheat software
A year ago, Riot Games (developer of Leage of Legends) announced this controversial blog post announcing a new anti-cheat system.
Cheating is a big problem in competitive games, and over the past many years, people have developed cheat software that runs inside the kernel in order to feed game programs fake data. For example, the game needs to request the mouse position via system calls, and the cheat software can intercept those syscalls and return a fake mouse position. If the game asks, “what other programs are running on this computer right now?,” the cheat software can intercept that system call and return with the list of programs minus itself, in order to conceal itself.
From the Riot blog post:
In the last few years, cheat developers have started to leverage vulnerabilities or corrupt Windows’ signing verification to run their applications (or portions of them) at the kernel level. The problem here arises from the fact that code executing in kernel-mode can hook the very system calls we would rely on to retrieve our data, modifying the results to appear legitimate in a way we might have difficulty detecting.
Game developers are not happy about this, and have decided to install their own component inside the kernel so that they can detect other kernel changes made for cheating. In order to install the game, you must also install an additional component inside the kernel that will watch out for any funny business tampering with syscalls.
This is quite controversial. On one hand, it seems necessary, because if the game is just living in a “box” (process) and the cheat software + OS are controlling all of its interactions with the outside world, then there is no possible way that the game can detect and ban all cheat software. On the other hand, the kernel is a sacred place, and we shouldn’t accept installation of code there “just because.” Kernel code must have an extremely high bar for quality, since it controls the entire system, and unfortunately companies do not usually produce such high quality code. Symantec, an antivirus company, uses a kernel module to implement antivirus scanning, but their kernel module was buggy and allowed a virus to hijack the kernel.