CS50 Week 1 Continued: Lecture – Learn | Think

Rant

CS50 is more often structured in week long units. I can get through that much learning material in about 10 days, including writing a series of articles. The class starts at week 0, and I can only guess that is because programmers count from 0. Week 1 is two units long (week 1 and week 1 continued). Since I’ve been working on completing this class for over a month, it’s frustrating to still be at week 1 and not week 3. Had to get that off my chest.

Lecture

David gives a quick recap of the CS50 IDE and gets right into compilation. Compiling code is a succinct way of defining the steps a compiler goes through from source code to binary (machine code). The compiler program used in class is called “clang.” The default output of clang is a file called “a.out,” and is usually modified using the -o flag which allows you to specify a different name. For example clang myfile -o myfile.c will output myfile.

Clarity

It’s important to name variables in a way that helps us understand what is going on in our program.

Example:

string s = get_string("Name: ");

versus:

string name = get_string("Name: ");

When the variable s is scattered through a big program, it’s hard to remember it’s the name of the user. But the variable name is much easier to remember.

help50

It is helpful to understand error messages that appear in clang. If the message is unclear, you can use the command help50 before the compiling command to sometimes get a more human readable message.

help50 clang hello.c

Compilation

Typing make <program name> in CS50 IDE is a command that runs 4 main processes to turn our source code into a machine readable program. The 4 processes that make up compiling are:

Preprocessing

~~This will take any #include directive it finds and replace it with the actual file. So #include <cs50.h> will find the cs50.h file and inject it into our program.~~ Although it isn’t mentioned in the class notes, preprocessing handles other directives including a super handy directive called if-else. You can create if else statements that run based on the value of a #define directive. example:

If I recompile this program with #define DEBUG set to 1 instead of 0 i’ll get all the debug messages to show up. I know I should learn more about the debugger so I don’t need this kind of thing!

Compiling

Although the entire process is called compiling or compilation, there is a step called compiling. This is something that has and will always frustrate me about the programming world in general, people do the best they can to classify things, but there are always discrepancies that make things more confusing than necessary. Anyways this step turns the source code we write into another lower level language called assembly. Assembly is more difficult to read than C, from what I’ve seen, and deals with the CPU at a much more intimate level. You actually can’t write an assembly program for many types of CPU chips like you can in C, because the assembly instructions are specific to that type of chip. Maybe one day I’ll learn some assembly and write about it.

Assembling

~~Assembling takes the assembly code from the compiling phase and turns that into binary zeros and ones.~~

Linking

It turns out functions from libraries like stdio.h need to be “linked” with our program. This is because compilation can find function definitions from within our file but not outside it. So a linker finds all the function definitions throughout the files being used for our program (usually object code files or .o) and links everything together into our final executable.

But wait didn’t I say preprocessing takes #includes and replaces it with the actual code?

yep. I have no idea why the linker is needed now. Going to get on IRC chat and see if someone on ##C can help…

Here is a stack overflow post as well. So now I feel much more confident to explain what is going on:

Let’s try this again…

Preprocessing is a glorified find and replace. If you have any preprocessor directives (those #include, #define etc.) this step replaces the directive with the actual values they represent. Any #include directive is replaced with the header information which is not the implementation of any function, but rather a prototype or function declaration.

Compiling takes preprocessed code and converts it into opcode (which is related to assembly but they are different). This type of code is specific to the CPU your computer is using. Different chips, different opcodes, C handles this for you.

Assembling takes the opcode and turns this into machine readable object code. But the object code won’t work as an executable yet.

Linking takes stdio.h and links those prototypes we mentioned earlier to their actual implementations, which live in library files. Linking makes sure functions can be called successfully in our program.

Honestly I probably still don’t have it 100% but this is my improved attempt so I’ll leave it for now.

Tools

The CS50 staff have created terminal commands to help us while coding in the CS50 IDE. Using the command check50, your code will be run several times using different scenarios to determine if it is passing code.

At the time of writing this I was unable to get check50 to work by going into the directory my file lived in, and running the command followed by the filename.

Example:

check50 caesar

Didn’t work for me but using the path cs50/2018/x/caesar seemed to work while in the directory.

style50 is much easier to get working. Just type style50 yourcode.c.

This is my code being printed out on the terminal with some suggestion coloring. Green spaces are added by the program to suggest they should be there. So if I write if(something) it will put a green space after the if.

Red spaces mean I have put those spaces in my code and style50 wants me to get rid of them. I personally don’t like this, because sometimes I add space so I can better understand what i’m writing and having all the code tightly bunched together is difficult for me to read. In sublime text you can change line padding so the lines are further apart so adding line breaks isn’t necessary, but I haven’t found a way to do this in the CS50 IDE yet.

Once you’ve made all the changes style50 will print out Looks good! I’ll try to follow the style guide moving forward but sometimes I feel like it misses out on the slick code you can write in C. For example if statements and for loops don’t require curly braces if the block of code to execute is only 1 line. Unfortunately style50 is gonna tell you to add the curly braces.

if (something)
    printf("doing something");
else
    printf("not doing something");

instead style50 wants this

if (something)
{
    printf("doing something");
}
else
{
    printf("not doing something");
}

Printing, Debugging

The printf function is like a typewriter to the terminal window. Using loops we can print a series of characters and even special characters which are instructions that can break lines, create spaces or tabs and even make beep noises to name a few.

Sometimes we get unexpected output from our program, and it is hard to just know where the error has occurred. We expect our programs to work, so this can be frustrating. The CS50 debugger is very helpful for tracking down issues, and I admit I have a hard time letting go of good old printf statements everywhere, but I think I need to start getting into the habit of using a debugger.

Let’s create an example of a program that compiles but outputs an unintended result:

This program runs a loop and within that loop multiplies the counter using a multiply function. Because the implementation is somewhere else it is not so easy to see what is happening. The goal is to print how many “bonus” points have been given on each iteration, but for some reason on the last iteration there is an error. How can this be? We start with i = 900, it loops 5 times and we’re multiplying i and a positive number. We can see within the multiply function implementation that the number is 2376000. So let’s use debug to see what is happening. Let’s start by setting a breakpoint. A breakpoint is a line number somewhere in the code where we want to “pause” our program so it doesn’t run all the way through. This gives us a snapshot of the state of all variables at that point.

By clicking to the left of the line number, a red dot appears, if you accidentally click and see a github logo icon in white, just click on your source code text to clear it out and try again.

Now that we have our breakpoint debug50 can run. In the terminal we type:

debug50 myfile

In my case, the source code is test.c so i’m going to type debug50 test which will start the debugger program. A panel should slide open on the right side and start to output some information. Using test.c it looks like this:

Looking at the right panel you can see a section called local variables. This is helpful because we can see when the loop runs the first time it calls multiply, and at our breakpoint it is just about to return the result of num (900) multiplied by 2376000. We can see what the result variable is equal to just before it is returned back to the loop to be sent to the if/else statement.

Note that I don’t put the breakpoint right at the initialization of the variable result, but right after when i’m about to return it back to the caller. This is because the debugger won’t know what result equals until one step after the variable initialization. If you were to put the break point right on the same line as int result= num * 2376000 the first loop around would give you a weird nonsensical number and following that would give you the previous iteration’s value.

Now let’s say we don’t know when the error hits. We do know that we end up in the else block of our code so let’s toggle off the breakpoint on line 21 and put the breakpoint instead on line 14, within the else block. This saves us time because we don’t have to loop over and over through each iteration.

Now when we run debug50 test we get this:

When we hit the else block mynumber is -2147063296, a negative number. So going back to the terminal error message, signed integer overflow, when we originally ran our test program, it seems like we’ve exceeded the bounds of what an integer data type can handle. So since it can’t go any higher it wraps back to a negative value and up from there. To solve this issue we change all our relevant int types to long long types.

Now it works!

Strings, Arrays

The cs50 c library has several functions to get user input. These are:

get_char

get_double

get_float

get_int

get_long_long

get_string

The name of each works with the data type of the same name after the “get_” part of the function name. The function get_long_long will return a data type of long long and so on.

It’s also important to know the format codes for each data type so when we want to print results to the terminal we can get the right output.

%c - characters

%f - floats

%i - integer (%d also works)

%lld - long long

%s - string

If you want more detailed info about any cs50 function you can type man and the function name to get a fairly technical document.

If the info in the man pages aren’t helpful, and are more confusing than anything else, try the cs50 reference website.

It’s worth mentioning C doesn’t have a native string data type. The cs50 library has created this abstraction which represents a series of characters in memory.

Here is a code example to demonstrate:

The output to the terminal:

Some things to note, strlen is a function provided in the string.h library. It will return the length of characters in a sequence of them. So if my name is “John” it will return 4. There is a special character at the end of these sequences of characters, or strings, called a null character. It is represented like this: ‘\0′.

So the string “John” is an array of 4 chars (characters) and a null character. As a whole this would be called a Null-terminated string.

In the code you can also see we have the user’s name held in a string data type but printf outputs a character data type. This is known as type casting. We can go further and product the integer value for each character as well which maps to ascii to give us the character.

Since we have this fine control on how each character is printed to the screen, let’s try a small challenge. If a character is given that isn’t a capital letter let’s capitalize it before it is printed. How can we check for this?

If I use the expression

char[] x = "J";

I end up with a null terminated string, which means there is a hidden ‘\0’ you don’t see. Also keep in mind this is an array of type char. Now it is very subtle but if I use single quotes with a variable of type char…

char x = 'J';

Now I get just the ‘J’ character by itself, and also this is the integer 74 which is the integer that maps to that character. I can actually get K by just adding 1 to ‘J’.

printf("%c", 'J' + 1); // will output K

So back to the problem at hand. Here is the ascii map webpage I use to figure this kind of stuff out. We need to say:

given name[i] character in our loop
test if lower case (what is the range of lowercase #'s?)
if it is make it uppercase (integer distance from a to A?)

First let’s solve our conditional test. The ascii table shows us that lower case ‘a’ starts at integer value 97 and ends with ‘z’ at integer value 122.

char current_value = name[i]
bool is_lower_case = (name[i] >= 'a') && (name[i] <= 'z')
// bool is <cs50.h> data type for boolean. Int data type
// will work but not as verbose

Between lower case and upper case ascii chars are some other characters like ‘[‘ for brackets for example, so that can make it a little daunting at first but it’s not so bad when you convert everything to their number values.

For example,

‘a’ is 97

‘A’ is 65

If I said to you, “hey if I give you 97, return back 65,” you would just subtract some number from 97. So hopefully you know where this is going.

int magic_number = 97 - 65; // 32

Not the answer to life the universe and everything but now we know, subtract 32 from a lower case character to make it upper, and add 32 to an upper case character to make it lower. So let’s write this program up:

Cryptography

Although the implementations of cryptography can get complex, the process is simple to understand. If I put in a password into some system, it shouldn’t save the actual password. It should encrypt the password and save the output of the encryption. Let’s make a super simple encryption. Take the above example where I just add 1 to ‘J.’

If I add 1 to the ‘J’ character I get ‘K.’ So if I had a terrible password like “John.” The program wouldn’t save “John,” it would save the output of adding 1 to every character in my name. So what good is the system storing “Kpio?”

Well next time I go to type in my password, it’s going to run the same program, adding 1 to each character i’ve typed in, so since it is storing “Kpio,” it’s checking to see if what i’ve written is a match after running the encryption algorithm. Hope that made sense.

In professional created systems encryption is very hard to reverse engineer. So passwords are very hard to crack.

That is it for the week 1 lecture. I’ve been pretty behind because of all the training for the triathlon. I’ll write about that sometime too.

Until next time!