Mar 1110 min read

Harvard's CS50x Week 2: Navigating the Depths of C

Week 2 picks up right where Week 1 left off – beneath the murky waters of C.

As we swim our way up to higher-level – that is to say, less syntactic – languages like Python, we will be taking a more nuanced look at what really goes on within a computer’s memory.

Remember that computer science is centered around the art of problem-solving — not just programming!

As such, we will continue to keep a focus on the bigger picture: the method to approaching computer science problems.

Yet again, this guide will be split into two parts: my personal, simplified notes and observations on the course content, and my experiences with the introductory problem set.

Lecture Notes

if (!hasVisitedPreviousBlogs())

{

pleaseVisitBlogs();

}

else

{

keepReading();

}

Jokes aside, let’s break down Week 2.

Compilers: From Hero to 0’s (and 1’s)

Recall that in Week 1, before running any of our programs, we had to pass them through a special piece of software called a compiler.

In VS Code, the compiler used is clang, short for c language, which converts human-written source code into machine code, or binary.

Funnily enough, the command ‘make’ is not a compiler in and of itself – it simply calls clang, creating an output file that can be run by the user via our ‘./’ syntax.

So, in theory, we could use clang and make interchangeably, right?

Almost.

With files that include certain standardized libraries (e.g. ‘#include <stdio.h>’), clang can be used, albeit, with some extra steps.

Let’s take the case of our hello.c file:

Execute clang on the explicit C file ‘hello.c’ in the terminal
Run ls to find the file output by clang – this filename usually defaults to ‘a.out’
Run ‘./a.out’ as you would a normal file in order to generate ‘hello, world’

To further simulate our magical ‘make’ terminal command, we can pass to clang an argument that tells it the filename of the output.

However, problems arise when clang is executed on a program with foreign libraries – here, clang is struggling to make sense of ‘get_string’, a function specific to the cs50.h library.

In order to solve this bug, we have to manually tell our compiler to link the cs50 library with the ‘undefined’ functions used within our code via the syntax below.

$ clang -o hello hello.c -lcs50

So, in short, it’s safer and easier to abstract away the chaos of clang and use make, at least within the scope of CS50!

To further understand the process of compiling better, let’s split it up into four simple steps:

Preprocessing – the step in which the code from the libraries used within your code (denoted by ‘#include <...>’) is copied and pasted into your program.
Visually, this would look like the functions you’ve used within your code being declared with their prototype, as shown below.

...

string get_string(string prompt);

int printf(string format, ...);

...

int main(void)

{

string name = get_string("What's your name? ");

printf("hello, %s\n", name);

}

Compiling – not to be confused with the overarching concept of compiling, this substep involves the conversion of your source code to assembly code (which is a language just a bit higher-level than machine code)

Assembling – within this step, the compiler converts assembly code to binary

Linking – finally, the code from included libraries is also converted into machine code, and then bunched together with your own binary-converted code. You can imagine the final output as a long series of 0’s and 1’s

Memory, Arrays, and Strings

In the previous week, we were briefly introduced to data types like bool, int, char, string, etc. Each of these stores different types of information, ranging from integers and decimals, to characters and full-fledged words.

Like anything we can use to program, these data types require certain amounts of memory and storage on our system.

bool – 1 byte
char – 1 byte
int – 4 bytes
float – 4 bytes
double – 8 bytes
long– 8 bytes
string – ? bytes

In terms of our device’s hardware, these data are stored in binary within chips called RAM – random-access memory.

Of course, these chips have only a finite amount of memory, and can thus be represented as an array of spaces:

We can assume that the one square shaded in yellow stores a bool or char, which each occupy 1 byte in memory.

So each time a variable is declared, for instance, it occupies a certain number of bytes, or ‘squares’, within memory depending on its data type.

Professor Malan further explains that arrays are more of the same – a collection of data stored back-to-back in memory so as to make that data easily accessible.

Let’s see how arrays work using a simple program to print the average of three scores.

With our understanding of C thus far, we would probably tackle the problem with something of the like:

#include <stdio.h>

int main(void)

{

// Scores

int score1 = 72;

int score2 = 73;

int score3 = 33;

// Print average

printf("Average: %f\n", (score1 + score2 + score3) / 3.0);

}

Bear in mind that each of our three variables shown would take up four bytes of memory.

Now let’s tackle this very problem, but with an array of ints instead.

#include <cs50.h>

#include <stdio.h>

int main(void)

{

// Scores

int scores[3];

scores[0] = 72;

scores[1] = 73;

scores[2] = 33;

// Print average

printf("Average: %f\n", (scores[0] + scores[1] + scores[2]) / 3.0);

}

Note that we’ve declared an array of integers of size three, essentially telling our compiler to reserve four bytes of memory for each integer. We’ve then populated the array by indexing into the scores array, starting from 0. Additionally, the ‘%f’ format specifier expects a float, or a decimal value…

However, our scores array is of type int. It turns out that as long as one value within an expression is a float, 3.0 in this case, the equation will also be cast as type float.

Not only do arrays bunch together data side-by-side in memory, but they can also be used to improve program design – with a single array, we can store large amounts of data without having to create individual variables for each new element.

In order to populate an array more dynamically, we can make use of a for loop, where i is the index.

int scores[3];

for (int i = 0; i < 3; i++)

{

scores[i] = get_int("Score: ");

}

Circling back to datatypes, recall that we assigned an unknown number of bytes to data of type string. It turns out that a string is, in fact, not a primitive datatype in C – just an array of characters!

So what’s the difference between using multiple characters and a string to store data?

Let’s visualize this. Say you’ve initialized a string that contains the message “HI!”, and similarly, three variables of type char, each containing “H”, “I”, and “!”, respectively.

Within memory, our three chars would be stored in some manner similar to this, as expected.

Our string, however, would contain an extra special character called a NUL character, denoted as ‘\0’ in C.

Note that a string will behave in the same way that an array does (being an array itself) – that is to say it, too, can be indexed into.

Thus, the number of bytes within a string is equal to however many characters there are plus that extra one character (assuming that each char occupies 1 byte).

The reason strings feature the NUL character, or terminator, is so the program can determine exactly where it starts and stops.

Because strings don’t have a set number of bytes, the computer couldn’t possibly differentiate between the start and end of a string without that terminator.

Conversely, ints – which have exactly four bytes – don’t require that NUL character because the computer can expect it to occupy a fixed number of bytes in memory.

The string terminator can be used outside the abstract scope of memory, and within our code itself. Let’s use ‘\0’ to determine the length of a string.

We can replicate the ‘strlen()’ function found in the string.h library by looping over a string until we reach the NUL character, as seen here:

#include <cs50.h>

#include <stdio.h>

int main(void)

{

// Prompt for user's name

string name = get_string("Name: ");

// Count number of characters up until '\0'

int n = 0;

while (name[n] != '\0')

{

n++;

}

printf("%i\n", n);

}

Similarly, we can use other libraries, such as ctype.h, to manipulate characters within a string, or character array.

Professor Malan walked us through the implementation for a program that capitalizes all inputted text, without the ‘toupper()’ function, as shown below.

#include <cs50.h>

#include <stdio.h>

#include <string.h>

int main(void)

{

string s = get_string("Before: ");

printf("After: ");

for (int i = 0, n = strlen(s); i < n; i++)

{

if (s[i] >= 'a' && s[i] <= 'z')

{

printf("%c", s[i] - 32);

}

else

{

printf("%c", s[i]);

}

printf("\n");

}

Note that, here, we can numerically compare specific indexed values of a string, or chars, to other chars numerically thanks to ASCII.

The final kicker is, strings only exist within the CS50 library! The string counterpart in tradition C is, quite literally, a char array.

Command-Line Arguments and Exit Status

Although you’re not yet likely familiar with the term ‘command-line arguments’, we’ve already used them numerous times throughout CS50 already. Command-line arguments are simply arguments that are passed to your program at the terminal window.

$ clang -o hello hello.c -lcs50

Here, every statement following clang is considered a command line argument.

So how can we apply this to our own programs?

Recall that our main function has, as of yet, taken no parameters, or void.

int main(void)

{

...

}

However, we can modify this code to include two new command line arguments – argc, an integer that stores the number of arguments (argument count), and argv[], an array of strings that holds all arguments (argument vector).

int main(int argc, string argv[])

{

...

}

Let’s use these arguments to improve our code. Assume we wanted to create a program that says hello to the user. With our knowledge thus far, our solution would probably look something like this:

But wouldn’t it be more convenient, in some cases, to allow the user to enter their name even before the program runs?

Let’s test this code again, but this time, using arguments.

The code here seems to be running without any flashing red errors, which seems to be a good sign… however, there is yet a bug within our program – instead of ‘hello, Bob’, the code returns ‘hello, ./hello’. It turns out, ‘./hello’ itself is a command-line argument and the zeroth element stored in the argv[] array.

Upon accessing the first element within the argv[], our program runs as expected! But what happens if our program tries to access a command-line argument that isn’t provided by the user?

$ make hello

$ ./hello

hello, (null)

$ ./hello Bob

hello, Bob

In the example above, the program tried to access the first element of argv[], when one was not provided. In cases like this, ‘(null)’ will be printed to the terminal, as the value of argv[1] is technically NULL or null (not to be confused with the NUL character, which is the string terminator that holds the value of 0 in ASCII).

Keeping to the ‘main()’ function, you’ve likely noticed that it claims to return an integer, signified by the int at the beginning of the function signature – yet, there are no return statements in sight!

It turns out that in C, when a program terminates, an exit code is generated by default, whether you manually return that code in main or not.

A status code of 0 indicates that the program has run without error, and that of 1 signifies the presence of an error, resulting in the termination of that program. We can modify our code in hello.c to illustrate this.

Here, if the number of arguments was anything other than two, the ‘main()’ function would return 1 as the exit status.

Otherwise, the program would greet the user as intended, and then return 0.

Problem Set 2: Readability

For this problem set, while there are five potential challenges, I’ll be giving a rundown on all of the functions used within Readability, the first of these challenges.

In Readability, the objective is simple: implement a program that displays the estimated grade level needed for a human to comprehend some text.

For instance, running readability and inputting some text should result in the following:

$ ./readability

Text: Congratulations! Today is your day. You're off to Great Places! You're off and away!

Grade 3

Provided is an equation used to calculate the grade level required to understand a text.

index = 0.0588 L - 0.296 S - 15.8

Here, L is the average number of letters, and S is the average number of sentences, per every 100 words.

Let’s begin with the ‘counter_letters()’ function, which takes in a string argument and outputs an int, or the number of letters within the provided text.

Firstly, I set a counter variable to 0, which is responsible for storing the number of letters within the text. Next, I implemented a for loop starting from index zero, up until, but not including, the length of the string ‘text’.

Within, is an if statement that checks whether each index of the string, or character, is within the range for lower-and-uppercase letters, determined by the ASCII chart.

Remember that chars can be treated as integers!

Of course, if this condition is met, the counter increases, and the total number of letters is finally returned.

An alternate approach you could take to completing this function is through use of the ‘toupper()’ and ‘tolower()’ functions defined within the ctype.h library.

We can convert each individual char to its lower-or-uppercase counterpart to then check whether it lies within one range of numbers instead of two

Using similar logic, I implemented the ‘count_words()’ and ‘count_sentences()’ functions. With the former, word_counter begins with one instead of zero because we’re checking for the instance of a space within the text – in doing so,

I accounted for the final word in our string that is not followed by a space.

In ‘count_sentences()’, we’ve defined the criteria for the end of a sentence as the presence of a period, exclamation point, or a question mark.

Note that because the final sentence will also have terminating punctuation, we can start our counter at zero in this situation.

Finally, our ‘main()’ function ties all of our prior functions together neatly, first retrieving text from the user, and then calculating the average numbers of letters and sentences, which are stored in variables L and S, respectively.

To calculate the averages, I passed text into each of the three functions, and divided the amount of letters and sentences by the amount of words over 100, as was instructed in the problem set background.

It’s worth noting that variables L and S are of type double, while all three of our other functions return ints. In C, performing mathematical operations on ints will result, again in an int.

In order to avoid this integer-division truncation-trap, I casted one of the two values being divided, ensuring that the end result would, indeed, be a double.

Plugging these values into the variable index, and casting back to an int, I checked which grade level each text belonged to via a sequence of conditional statements.

Always be sure to include all necessary libraries and declare your functions before they’re called within your code!

Final Thoughts

To conclude with some advice from the professor: If you’re feeling stuck on a problem set, or you’re having trouble with your code, consider speaking out loud to a rubber duck…

It’s, allegedly, an age-old debugging method, but I think I’ll just take his word for it.

Anyhow, that’s a wrap for Week 2!

See you soon! Meanwhile, stay tuned for updates by following my blog and LinkedIn page.

Note: this article has images and code belonging to CS50. You are free to you remix, transform, or build upon materials obtained from this article. For further information, please check CS50x’s license.

Harvard's CS50x Week 2: Navigating the Depths of C

Recent Posts

Comments

I value your feedback.
Drop a line to let me know what you think.

hello@sabirseth.com