Assignment 1: Six Degrees of Kevin Bacon
This handout was adapted from Jerry Cain’s Spring 2021 offering.
Craving a little Oscar trivia? Try your hand at an Internet parlor game about Kevin Bacon’s acting career. He’s never been nominated for an Oscar, but he’s still immortal – based on the premise that he is the hub of the entertainment universe. Mike Ginelli, Craig Fass and Brian Turtle invented the game while students at Albright College in 1993, and their Bacon bit spread rapidly after convincing then TV talk-show host Jon Stewart to demonstrate the game to all those who tuned in. From these humble beginnings, a website was built, a book was published and a nationwide cult-fad was born.
When you think about Hollywood heavyweights, you don’t immediately think of Kevin Bacon. But his career spans over 40 years through films such as Flatliners, The Air Up There, Footloose, The River Wild, JFK and Animal House. To brush up on your Bacon lore and play online, visit http://oracleofbacon.org.
This assignment is first and foremost a systems programming assignment, but it’s also an opportunity to review your C++ while simultaneously exercising some software engineering and low-level memory manipulation skills. You’ll also get to see that low-level C coding and high-level C++ data structuring can coexist in the same application.
Due Date: Tuesday, June 29th, 2021 at 11:59pm, PDT.
Note: We’re requiring this assignment be turned in on time so that we can give you prompt feedback in preparation for the second assignment. If you need an extension, come talk to us in advance.
Overview of the functionality
The game takes the form of a trivia challenge: supply any two names, and your friend/opponent has to come up with a sequence of movies and mutual co-stars connecting the two. In this case, your opponent takes on the form of an executable, and that executable is infuriatingly good.
Jack Nicholson and Meryl Streep? That’s easy:
$ ./search "Meryl Streep" "Jack Nicholson (I)"
Meryl Streep was in "Close Up" (2012) with Jack Nicholson (I).
Mary Tyler Moore (loved her!) and Red Buttons? Not so obvious:
$ ./search "Mary Tyler Moore" "Red Buttons"
Mary Tyler Moore was in "Change of Habit" (1969) with Regis Toomey.
Regis Toomey was in "C.H.O.M.P.S" (1979) with Red Buttons.
Barry Manilow and Lou Rawls? Yes!
$ ./search "Barry Manilow" "Lou Rawls"
Barry Manilow was in "Bitter Jester" (2003) with Dom Irrera.
Dom Irrera was in "A Man Is Mostly Water" (2000) with Lou Rawls.
It’s the people you’ve never heard of that are far away from each other:
$ ./search "Danzel Muzingo" "Liseli Mutti"
Danzel Muzingo was in "My Day in the Barrel" (1998) with Chala Savino.
Chala Savino was in "Barbershop: The Next Cut" (2016) with Troy Garity.
Troy Garity was in "Sunshine" (2007) with Cliff Curtis (I).
Cliff Curtis (I) was in "Rapa Nui" (1994) with Liseli Mutti.
Optionally, the search
executable accepts an extra argument to limit the
search to something other than 6.
$ ./search "Danzel Muzingo" "Liseli Mutti" 4
Danzel Muzingo was in "My Day in the Barrel" (1998) with Chala Savino.
Chala Savino was in "Barbershop: The Next Cut" (2016) with Troy Garity.
Troy Garity was in "Sunshine" (2007) with Cliff Curtis (I).
Cliff Curtis (I) was in "Rapa Nui" (1994) with Liseli Mutti.
$ ./search "Danzel Muzingo" "Liseli Mutti" 3
No path between those two people could be found.
Getting set up
Getting the code
Go ahead and clone the starter repository that we’ve set up for you. Do so by typing this:
$ git clone /usr/class/cs110/repos/assign1/$USER assign1
This line places a copy of the starter files dedicated to you in your current
working directory. git
is the name of the source control framework we use to
keep track of your changes and minimize the chances you lose any of your work.
If you’re not enrolled in the course and you don’t have a repo, email me and
I’ll create one for you. In the meantime, start by cloning the read-only guest
repo (i.e. /usr/class/cs110/repos/assign1/guest
) and working with that until
your repo is available. Then you can clone your real repo and copy your work
over, file by file, from the guest copy to the one linked to your SUNet ID.
As you make progress toward a solution, you can invoke ./tools/sanitycheck
to
commit local changes and run a collection of tests that compare your solution
to my own. And be sure your solutions are free of memory leaks and errors,
since we’ll be running your code through valgrind
. Note: there’s a bug in
valgrind
that surfaces with virtually any C++ program. You can suppress this
error by copying the /usr/class/cs110/tools/config/.valgrindrc
file into your
home directory (or copying its contents into an existing ~/.valgrindrc
file):
$ more /usr/class/cs110/tools/config/.valgrindrc
--memcheck:leak-check=full
--show-reachable=yes
--suppressions=/usr/class/cs110/tools/config/cs110.supp
$ cp /usr/class/cs110/tools/config/.valgrindrc ~
Once you’re done and ready to turn everything in, you can type
./tools/submit
, which will offer the option to run sanitycheck
one last
time before pushing all of your changes back to your
/usr/class/cs110/repos/assign1/$USER
repo. You can run the submission script
as many times as you need to, and we’ll be sure to grade and code review your
most recent submission.
Setting up your environment
We have compiled a list of tips for various tools here. In particular, unless you are attached to your current editor, we highly recommend trying VSCode (we have a how-to video linked on that page). VSCode should provide a much better experience than vim or emacs if you’re living far from campus, since it is better able to manage latency, and it also has code analysis features that are extremely useful when working with many files as you will in this class.
If you’re interested in vim (which you can also use inside of VSCode!), we have a video and cheat sheet, and if you’re interested in emacs, there is a great guide on the CS 107 website.
Do you live very far from campus or have a spotty internet connection? We are prototyping a system that would let you work locally on your own machine, instead of needing to connect to myth. Let us know if you would be interested in helping to try this out.
Overview of the assignment
There are two major components to this assignment:
-
You need to implement the
imdb
class (imdb
is short for Internet Movie Database), which serves as a database that lets us query who starred in what. This class has two main methods, defined in theimdb.h
file:getCredits()
, which is used to get a list of films that an actor/actress has starred in, andgetCast()
, which is used to get a list of actors/actresses that starred in a given movie.You could implement this
imdb
class using twomaps
(one mapping people to movies, and another mapping movies to people), but there is so much data in this database that loading all the data from text files intomap
s would take several minutes, even on fast machines.Instead, you’ll tap your sophisticated understanding of memory and data representation and learn about something called memory mapping in order to look up movie and actor information from a prepared data structure that’s been saved to disk in binary form. This is the meatier part of the assignment, and it’s a throwback to the type of programming you got really good at in CS107 or its equivalent.
-
You need to implement a breadth-first search algorithm that enlists your
imdb
class to find the shortest path connecting any two actors or actresses. If the search goes on for so long that you can tell it’ll be of length 7 or more, then you can be reasonably confident (and pretend that you know for sure that) there’s no path connecting them. This part of the assignment is more CS106B-like, and it’s a chance to get some experience with the STL (usingvector
s,set
s, andlist
s) and to see a legitimate scenario where a complex program benefits from coding in two different paradigms: high-level, object-oriented C++ (with its STL template containers and template algorithms) and low-level, imperative C (with its exposed memory, courtesy of CS107,*
,&
,[]
, and->
).
Here is an overview of the files you will need for Task I:
imdb-utils.h
: The definition of thefilm
struct, and an inline function that finds the data files for you. You shouldn’t need to change this file.imdb.h
: The interface for theimdb
class. You shouldn’t change the public interface of this file, though you will likely change the private section in order to add helper methods or any other modifications you see fit.imdb.cc
: The implementation of theimdb
constructor, destructor, and methods. This is where your code forgetCast
andgetCredits
belongs.imdbtest.cc
: The unit test code we’ve provided to help you exercise yourimdb
. You don’t need to read, change, or understand this file, although you’re welcome to look at it if you want to understand howimdb
is being tested.
Everything from Task I (except imdbtest.cc
) contributes to the overall
search
application. You can run make
to compile your code. There’s a sample
executable at ./samples/search_soln
for you to play with. Understand that my
sample application and yours aren’t obligated to publish the same exact
shortest path, but you should be sure that the path lengths themselves actually
match.
In addition to the files used for Task I, there are these:
search.cc
: The file where most if not all of your Task II changes should be made.path.h
: The definition of thepath
class, which is a custom class useful for storing paths between two actors. You’re free to add methods if you think it’s sensible to do so.path.cc
: The implementation of thepath
class. Again, you can add stuff here if you think it makes sense to.
Task I: The imdb
class
First off, you should complete the implementation of the imdb
class, whose
interface looks like this:
struct film {
string title;
int year;
};
class imdb {
public:
imdb(const string& directory);
bool good() const;
bool getCredits(const string& player, vector<film>& films) const;
bool getCast(const film& movie, vector<string>& players) const;
~imdb();
private:
const void *actorFile;
const void *movieFile;
};
The constructor and destructor have already been implemented for you. All the
constructor does is initialize actorFile
and movieFile
fields to point to
on-disk data structures using the mmap
routine you’ll learn about later on in
the course. You don’t need to worry about how this works; all you need to know
is that when you’re implementing the rest of this class, you can actorFile
and movieFile
as if they were pointing to normal buffers like those you used
in CS 107, but under the hood, when you read something from those memory
regions, the operating system reads it from the actor/movie files on disk. This
enables our database to load much faster, as we only read the portions of the
actor/movie file that we need for our search, instead of reading the entire
(huge) files into memory.
You’ll need to implement the getCredits
and getCast
methods by manually
crawling over these binary images in order to produce vector
s of movies and
actor names. When properly implemented, they provide lightning-speed access to
a gargantuan amount of information, because the information is already
compactly formatted in a prepared data structure that lives on the myth
s.
Understand up front that you are implementing these two methods to crawl over two arrays of bytes in order to synthesize data structures for the client. What appears below is a description of how that memory is laid out. You aren’t responsible for creating the data files in any way, but you are responsible for understanding how everything is encoded so that you can rehydrate information from its byte-level representation.
The Raw Data Files
The private actorFile
and movieFile
fields each address large blocks of
memory. Each is configured to point to mutually referent database images, and
the format of each is described below. The imdb
constructor sets these
pointers up for you, so you can proceed as if everything is initialized for
getCast
and getCredits
to just work.
For the purposes of illustration, let’s assume that Hollywood has produced a mere three movies and that they’ve always rotated through the same three actors whenever the time came to cast their three films. Pretend those three films are these:
- Clerks, released in 1993, starring Cher and Liberace.
- Moonstruck, released in 1988, starring Cher, Liberace, and Madonna.
- Zoolander, released in 1999, starring Liberace and Madonna.
Remember, we’re pretending.
If an imdb
instance is configured to store the above information, you might
imagine its actorFile
and movieFile
fields being initialized—by the
constructor I already wrote for you—as follows:

Each of the records for the actors and movies will vary in size. Some movie
titles are longer than others; some films feature 75 actors, while others star
only one or two. Some actors have prolific careers, while others are one-hit
wonders. Defining a struct
or class
to overlay the blocks of data is a
fine idea, except that doing so would constrain all records to be the same
size. We don’t want that, because we’d be wasting a good chunk of memory when
storing information about actors who appeared in just one or two films and
about films that feature just a handful of actors.
However, by allowing the individual records to be of variable size, we lose our
ability to binary search (hint: via the STL lower_bound
algorithm) a sorted
array of records. The number of actors and actresses is 2.5 million, and the
number of movies is just shy of 700,000, so a linear search would be way too
slow. All actors and movies are sorted by name (and then by year if two movies
have the same name), so binary search is still within reach. The strong desire
to binary search quickly motivated my decision to format the data files like
this:

Spliced in between the number of records and the records themselves is an array
of integer offsets. They’re drawn as pointers, but they really aren’t stored
that way. We want the data images to be relocatable. Restated, we can’t
embed actual memory addresses in the data images, because the binary image may
be loaded into memory at different locations each time an imdb
is created.
By storing integer offsets, we can manually compute the location of Cher’s
record, Madonna’s record, or Clerk’s record, etc, by adding the corresponding
offsets to whatever actorFile
or movieFile
turns out to be. A more
accurate picture of what gets stored (and this is really what the file format
is) is this:

Because the numbers are what they are, we would expect Cher’s 16-byte record to
sit 16 bytes from the front of actorFile
, Liberace’s 24-byte record to sit 32
bytes within the actorFile
image, and so forth. Looking for Moonstruck? Its
28-byte record can be found 36 bytes ahead of whatever address is stored in
movieFile
. Note that the offsets tell me where records are relative to the
base address, and the differences between consecutive offsets tell me how
large the records are.
Because all of the offsets are stored as four-byte integers (and int
s are
four bytes, even on 64-bit systems like the myth
s), and because they are in a
sense sorted if the records they reference are sorted, we can use binary
search. Woo.
To summarize:
actorFile
points to a large mass of memory packing all of the information about all of the actors. The first four bytes store the number of actors (as anint
); the next four bytes store the offset to the zeroth actor, the next four bytes store the offset to the first actor, and so forth. The last offset is followed by the zeroth record, then the first record, and so forth. The records themselves are sorted by name.movieFile
also points to a large mass of memory, but this one packs the information about all films ever made. The first four bytes store the number of movies (again, as anint
); the next*(int *)movieFile * sizeof(int)
bytes store all of theint
offsets, and everything beyond is real movie data. The movies are sorted by title, and those sharing the same title are sorted by year.- The above description above generalizes to files with 2,500,000 actors and 700,000 movies.
The Actor Record
The actor record is a packed set of bytes collecting information about an actor
and the movies he or she’s appeared in. We don’t use a struct
or class
to
overlay the memory associated with an actor, because doing so would constrain
the record size to be the same for all actors. Instead, we lay out the
relevant information in a series of bytes, the number of which depends on the
length of the actor’s name and the number of films they’ve appeared in. Here’s
what gets manually placed within each entry:
- The name of the actor is laid out character by character, as a normal
null-terminated C-string. If the length of the actor’s name is even, then
the string is padded with an extra
'\0'
so that the total number of bytes dedicated to the name is always an even number. The information that follows the name is most easily interpreted as ashort
, and themyth
s often constrain addresses manipulated asshort *
s to be even. - The number of movies in which the actor has appeared, expressed as a
two-byte short. (Some people have been in more than 255 movies, so a single
byte isn’t always enough). If the number of bytes dedicated to the actor’s
name (always even) and the short (always 2) isn’t a multiple of four, then
two additional
'\0'
's appear after the two bytes storing the number of movies. This padding is conditionally done so that the four-byte integers that follow sit at addresses that are multiples of four (again, because the 64-bitmyth
's might be configured to require this). - An array of offsets into the
movieFile
image, where each offset identifies one of the actor’s films.
Here’s what Cher’s record would look like:

The Movie Record
The movie record is only slightly more complicated. The information is compressed as follows:
- The title of the movie, terminated by a
'\0'
, so the character array behaves as a normal C-string incidentally wedged into a larger binary data figure. - The year the film was released, expressed as a single byte. This byte
stores the year, minus 1900. Since Hollywood is less than 256 years old, it
was fine to just store the year as an offset from 1900. If the total number
of bytes used to encode the name and year of the movie is odd, then an extra
'\0'
sits in between the one-byte year and the data that follows. - A two-byte
short
storing the number of actors appearing in the film, padded with two additional bytes of zeroes if needed. - An array of four-byte integer offsets, where each integer offset identifies
one of the actors accessible via
actorFile
. The number of offsets here is, of course, equal to theshort
read during step 3.
One major gotcha: Some movies share the same title even though they are
different. (The Manchurian Candidate, for instance, was first released in 1962,
and then remade in 2004. They’re two different films with two different
casts.) If you look in the imdb-utils.h
file, you’ll see that the film
struct provides operator<
and operator==
methods. That means that two
film
s know how to compare themselves to each other using infix ==
and <
.
You can just rely on the <
and ==
to compare two film
records. In fact,
you have to, because the movies in the movieData
binary image are sorted
to respect film::operator<
.
It’s best to work on the implementation of the imdb
class in isolation, not
worrying about the details of the search algorithm you’ll eventually need to
write. I’ve provided a test harness to exercise the imdb
all by itself, and
that code sits in imdbtest.cc
. The make
system generates a test
application called imdbtest
which you can use to verify that your imdb
implementation is solid. I provide my own version in ./samples/imdbtest_soln
(samples
is a symbolic link in your repo to a shared directory with solution
executables) so you can run your version and my version side by side and make
sure they match character for character.
Note: Your implementation will be—and in fact is intended to be—an interesting mix of C and C++. You’ll be relying on your mad C skills to crawl over the binary images, and you’ll be leveraging your C++ mastery to lift that data up into C++ objects. As part of your implementation, you’ll need to binary search over the actor and movie offsets to find the actor or movie of interest.
You’re required to use the STL
lower_bound
algorithm to perform these binary searches, and you’re also required to use C++
lambdas (also known as anonymous functions with capture clauses) to provide
nameless comparison functions that
lower_bound
can use
to guide its search. See the Tips and Tidbits section below for more
information.
Task II: Implementing Search
You’re back in pure C++ mode. At this point, I’m assuming your imdb
class
works flawlessly, and the fact that there’s some clever pointer gymnastics
going on in the imdb.cc
file is fully disguised by a delightfully simple
imdb
interface. Building on top of your imdb
class, no pointer arithmetic
is required!
For this task, you should use the services of your imdb
and my path
class
(discussed below) to implement a breadth-first search for the shortest possible
path. Leverage the STL containers as much as possible to get this done. Here
are the STL classes I used in my solution:
vector
: there’s no escaping this one, because theimdb
requires we pullfilm
s andactor
s out of the binary images asvector
s.queue
: Thequeue
providespush
(enqueue),front
(gets the first element), andpop
(removes the first element) operations.set
: I used twoset
s to keep track of previously used actors and films. If you’re implementing a breadth-first search and you encounter a movie or actor that you’ve seen before, there’s no reason to use it/them a second time. You shouldn’t need to use anything other thanset::insert
andset::contains
. NOTE:set::contains
is from C++20, and the Makefile is still using C++11. To upgrade to C++20, you can edit the Makefile at line 12 and changec++0x
toc++20
.
You’re welcome to take any approach you want to, provided it generates some shortest path between the supplied actors. We don’t require you generate the same exact path as my solution does, but we do expect the length of your shortest path to match mine.
Tips and Tidbits
Assignment 1 has been the same for the last several years, and I’ve seen a lot of submissions. In prior quarters, I’ve let students struggle through the roadblocks they face as part of the C and C++ review process. However, those struggles were mitigated by the ability to hang out with others for help. Since there’s no hanging out for a while, I figured I’d itemize what the pain points have been in the past and provide some hints to press through them.
Understanding lower_bound
and Lambdas
Most students have a difficult time with the idea of an anonymous function and how it should be used with lower_bound
. Fundamentally, an anonymous function is a function we don’t bother naming. In the context of imdb::getCredits
, the relevant lower_bound
call should look like this:
`const int *countp = (const int *) actorFile;
const int *begin = (const int *) actorFile + 1;
const int *end = begin + *countp;
const int *found = lower_bound(begin, end, player, [this](int offset, const string& player) {
return compareActorAtOffset(offset, player);
});`
begin
and end
are the addresses bracketing the array of integer offsets, player
is the name we’re searching for, and that intimidating fourth argument is the anonymous comparison function. The comparison function is used by the implementation of lower_bound
to compare offsets (or really, the strings reachable from those offsets) to the player
key.
Even if it looks strange, you’ll probably agree that the fourth parameter resembles the parameter list and body of a traditional function. What’s different? The function doesn’t have a name (that’s what makes it anonymous), and it makes use of a capture clause, which is the [this]
portion at the beginning.
The assumption here is that compareActorAtOffset
is a const private
helper method of the imdb
class, and it would need to be implemented to crawl over the image of actor data according to the assignment specification. It needs to be a method so that it has access to the private actorFile
field needed to find the actor at the supplied offset. Its implementation needs to return true
if and only if the actor name at the provided offset
is lexicographically less than the supplied player
. It’s that true
or false
that helps lower_bound
internally execute its binary search to either discover where the actor string’s offset is or where its offset would need to be inserted if everything were to remain sorted.
Reasonable question: Why not just pass in compareActorAtOffset
itself as a fourth parameter, as with this:
const int *countp = (const int *) actorFile;
const int *begin = (const int *) actorFile + 1;
const int *end = begin + *countp;
const int *found = lower_bound(begin, end, player, compareActorAtOffset); // won't compile
Well, the fourth parameter needs to be a traditional function, and methods—blocks of code invoked on behalf of an object—aren’t traditional functions.
Another reasonable question: What’s the [this]
all about?
That’s called a capture clause, and it’s a list of variables in the surrounding scope that needs to be shared with the body of the anonymous function for it to do its job properly. In this case, we capture this
, which is a keyword variable that always stores the address of the object being accessed. If we omit this
from the capture clause and just pass in an empty one, as with this:
const int *countp = (const int *) actorFile;
const int *begin = (const int *) actorFile + 1;
const int *end = begin + *countp;
const int *found = lower_bound(begin, end, player, [](int offset, const string& player) {
return compareActorAtOffset(offset, player); // won't compile! no access to this!
});
then the compiler wouldn’t know what object the call to compareActorAtOffset
references.
Interpreting lower_bound
's return value
It’s common to assume lower_bound
returns something NULL
-like whenever the underlying binary search for the key—getCredits
's player
parameter, for example—fails. In fact, the return value means one of two slightly different things, depending on whether the key is present or not. Let’s limit our discussion to the use of lower_bound within imdb::getCredits
.
- When the key—or rather, when
player
—is present, the return value is an iterator linked to the matching element. Because we’re dealing with a sequence of integer offsets that lead to actor names, the return value is the address of the offset linked to the matching actor. - When the key is not present, the return value is the address of the insertion point where a new offset would need to be inserted on behalf of the missing key if it were inserted. Of course, we’re not actually inserting missing names, but the
lower_bound
call doesn’t know that. It’s implemented to handle a larger universe of use cases than IMDB.
The union of bullet points one and two means you can’t simply look at the lower_bound
return value and know whether the actor is present. You need to do a little more work to see whether the return value leads to the matching actor name. If it does, you carry on and look for his or her credits. If not, you bail early and return false
, because nonexistent actors aren’t allowed to be in movies.
Write utility functions
You don’t need to be pedantic about it, but you should unify the crunchy
pointer arithmetic needed to resuscitate an actor and a film from the data
images to helper functions. The (char *
), (short *
), and (int *
)-casting
gymnastics is by far the most error-prone form of coding we’ll require of you
all quarter, so it’s best to consolidate as much of it to just one or two
places, and to rely on those utilities to enhance the narrative of your
getCredits
and getCast
methods.
Making Search Fast
A lot of students are interested in making their breadth-first search for paths between two actors as fast as possible. When the search is slower than the sample executable by more than a factor of three or four, then it is most often caused by making a lot of unnecessary copies of offset arrays. In the past I’ve seen students pull all integer offsets into a dedicated vector and then call lower_bound on the vector endpoints. Don’t do that. The linear copies are expensive and totally unnecessary. The offset arrays directly reachable from actorFile and movieFile are perfectly valid arrays.
If you really want to investigate optimizations to make the search faster you can do one or both of the following (my solution actually does the first, but doesn’t do the second):
- Check to see which of the two actor endpoints has the smaller number of costars, and initiate the search from that actor. It’s a lot easier to find the haystack around a needle than it is to find the needle in a haystack. Restated, it’s easier to find a path from “Liseli Mutti” to “Julia Roberts” than it is to find a path from “Julia Roberts” to “Liseli Mutti”.
- The naive (but perfectly acceptable) solution maintains a queue of partial paths emanating away from one of the two actors we’re connecting. That implementation, however, should never allow a movie or actor to extend a partial path if it contributes to a previously generated partial. Stated differently, each movie/actor pair—whether or not it ultimately contributes to the answer—has a unique predecessor. Rather than maintaining a queue of partial paths, you can reduce the memory footprint of the search by maintaining a queue of movie/actor pairs, where each pair is the last pair of some partial path, provided you also build up a predecessor map along the way. Once you find the target actor, you can quickly build up the path object by navigating the predecessor map back to the source. Then you can print!
Additional Hints
- Our CS106B and CS106X classes rely on a collection of C++ libraries that
aren’t generally available outside of Stanford. I want you to code in
standard C++ instead of the Stanford dialect of it, so you’ll need to
teach yourself some of the C++ that CS106B/X circumvented. In particular, the
CS106
Vector
andMap
templates are officially verboten; you’ll instead need to teach yourself the standardvector
andmap
classes that ship with all C++ compilers. I’ve posted a slide deck from prior CS110 offerings that teach some of the C++ I speak of. And you should visit—even bookmark—the landing page of an online C++ documentation site that I think is pretty superb. In particular, pay attention to the documentation for vector, map, queue, set, and lower_bound. - Your implementation of the
imdb
class—in particular, itsgetCast
andgetCredits
methods—needs to crawl over memory to resuscitatefilm
s from the binary data images. Strive to unify common code to helper functions and methods so that you never repeat the same crunchy line of pointer arithmetic more than once. This is particularly important here, because you’re very likely to get the pointer arithmetic wrong the first few rounds, and it’s important coding errors be confined to one spot instead of being replicated verbatim across several locations. - Try to minimize the number of deep copies you make of large data structures
(e.g.
vector
s,map
s,set
s, etc.) by passing them by reference or byconst
reference. In general, I’m happy to pass primitives likeint
s andbool
s by value, but anything as large as astring
I generally pass by reference unless I have an exquisite reason for making deep copies. - In past quarters, some students have constructed working solutions without
using (or even recognizing the existence of) the supplied
path
class. However, your life will be made much easier if you use it, sincepath
s know how to print themselves out via a customoperator<<
, and it just so happens thatpath
s print themselves in a way that plays well with our autograder. - Speaking of the autograder, you should rely on our
sanitycheck
tool – again, invoked via./tools/sanitycheck
– to confirm that your solution matches my own for a small suite of test cases. The tests exposed viasanitycheck
are a subset of the tests we’ll use when grading. If you passsanitycheck
on Assignment 1, then we promise your functionality score will be at least 75%. (Note: functionality score was inserted to be clear that there’s a code review component to the overall grade as well. Your functionality score, however, counts five times as much as the code review score, and the code review score amounts to several bucket grades [exceptional, solid, minor-errors, major-errors, etc.] across several categories, and most bucket grades end up being solids and minor-errors. - If you need a tutorial on how
valgrind
works, you can leverage the documentation written for CS107. A full treatise on whatvalgrind
is and what it can do for you can be found right here. More generic documentation on how to test withvalgrind
can be found right here. - Cool fact: Mike Krieger, co-founder of Instagram, took CS107 from Jerry Cain about almost 20 years ago and did this assignment in a slightly different form. Fast forward ten years and Jerry finds out that Mike and Kevin Systrom used the same binary data image compression techniques used here to compactly store large amounts of data for their v1.