Rant
CS50 is more often structured in week long units. I can get through that much learning material in about 10 days, including writing a series of articles. The class starts at week 0, and I can only guess that is because programmers count from 0. Week 1 is two units long (week 1 and week 1 continued). Since I’ve been working on completing this class for over a month, it’s frustrating to still be at week 1 and not week 3. Had to get that off my chest.
Lecture
David gives a quick recap of the CS50 IDE and gets right into compilation. Compiling code is a succinct way of defining the steps a compiler goes through from source code to binary (machine code). The compiler program used in class is called “clang.” The default output of clang is a file called “a.out,” and is usually modified using the -o flag which allows you to specify a different name. For example clang myfile -o myfile.c will output myfile.
Clarity
It’s important to name variables in a way that helps us understand what is going on in our program.
Example:
string s = get_string("Name: ");
versus:
string name = get_string("Name: ");
When the variable s is scattered through a big program, it’s hard to remember it’s the name of the user. But the variable name is much easier to remember.
help50
It is helpful to understand error messages that appear in clang. If the message is unclear, you can use the command help50 before the compiling command to sometimes get a more human readable message.
help50 clang hello.c
Compilation
Typing make <program name> in CS50 IDE is a command that runs 4 main processes to turn our source code into a machine readable program. The 4 processes that make up compiling are:
Preprocessing
This will take any #include directive it finds and replace it with the actual file. So #include <cs50.h> will find the cs50.h file and inject it into our program. Although it isn’t mentioned in the class notes, preprocessing handles other directives including a super handy directive called if-else. You can create if else statements that run based on the value of a #define directive. example:
Compiling
Although the entire process is called compiling or compilation, there is a step called compiling. This is something that has and will always frustrate me about the programming world in general, people do the best they can to classify things, but there are always discrepancies that make things more confusing than necessary. Anyways this step turns the source code we write into another lower level language called assembly. Assembly is more difficult to read than C, from what I’ve seen, and deals with the CPU at a much more intimate level. You actually can’t write an assembly program for many types of CPU chips like you can in C, because the assembly instructions are specific to that type of chip. Maybe one day I’ll learn some assembly and write about it.
Assembling
Assembling takes the assembly code from the compiling phase and turns that into binary zeros and ones.
Linking
It turns out functions from libraries like stdio.h need to be “linked” with our program. This is because compilation can find function definitions from within our file but not outside it. So a linker finds all the function definitions throughout the files being used for our program (usually object code files or .o) and links everything together into our final executable.
But wait didn’t I say preprocessing takes #includes and replaces it with the actual code?
yep. I have no idea why the linker is needed now. Going to get on IRC chat and see if someone on ##C can help…
Here is a stack overflow post as well. So now I feel much more confident to explain what is going on:
Let’s try this again…
Preprocessing is a glorified find and replace. If you have any preprocessor directives (those #include, #define etc.) this step replaces the directive with the actual values they represent. Any #include directive is replaced with the header information which is not the implementation of any function, but rather a prototype or function declaration.
Compiling takes preprocessed code and converts it into opcode (which is related to assembly but they are different). This type of code is specific to the CPU your computer is using. Different chips, different opcodes, C handles this for you.
Assembling takes the opcode and turns this into machine readable object code. But the object code won’t work as an executable yet.
Linking takes stdio.h and links those prototypes we mentioned earlier to their actual implementations, which live in library files. Linking makes sure functions can be called successfully in our program.
Honestly I probably still don’t have it 100% but this is my improved attempt so I’ll leave it for now.
Tools
check50 caesar
This is my code being printed out on the terminal with some suggestion coloring. Green spaces are added by the program to suggest they should be there. So if I write if(something) it will put a green space after the if.
Red spaces mean I have put those spaces in my code and style50 wants me to get rid of them. I personally don’t like this, because sometimes I add space so I can better understand what i’m writing and having all the code tightly bunched together is difficult for me to read. In sublime text you can change line padding so the lines are further apart so adding line breaks isn’t necessary, but I haven’t found a way to do this in the CS50 IDE yet.
Once you’ve made all the changes style50 will print out Looks good! I’ll try to follow the style guide moving forward but sometimes I feel like it misses out on the slick code you can write in C. For example if statements and for loops don’t require curly braces if the block of code to execute is only 1 line. Unfortunately style50 is gonna tell you to add the curly braces.
if (something) printf("doing something"); else printf("not doing something");
instead style50 wants this
if (something) { printf("doing something"); } else { printf("not doing something"); }
Printing, Debugging
The printf function is like a typewriter to the terminal window. Using loops we can print a series of characters and even special characters which are instructions that can break lines, create spaces or tabs and even make beep noises to name a few.
Sometimes we get unexpected output from our program, and it is hard to just know where the error has occurred. We expect our programs to work, so this can be frustrating. The CS50 debugger is very helpful for tracking down issues, and I admit I have a hard time letting go of good old printf statements everywhere, but I think I need to start getting into the habit of using a debugger.
Let’s create an example of a program that compiles but outputs an unintended result:
By clicking to the left of the line number, a red dot appears, if you accidentally click and see a github logo icon in white, just click on your source code text to clear it out and try again.
Now that we have our breakpoint debug50 can run. In the terminal we type:
debug50 myfile
In my case, the source code is test.c so i’m going to type debug50 test which will start the debugger program. A panel should slide open on the right side and start to output some information. Using test.c it looks like this:
Looking at the right panel you can see a section called local variables. This is helpful because we can see when the loop runs the first time it calls multiply, and at our breakpoint it is just about to return the result of num (900) multiplied by 2376000. We can see what the result variable is equal to just before it is returned back to the loop to be sent to the if/else statement.
Note that I don’t put the breakpoint right at the initialization of the variable result, but right after when i’m about to return it back to the caller. This is because the debugger won’t know what result equals until one step after the variable initialization. If you were to put the break point right on the same line as int result= num * 2376000 the first loop around would give you a weird nonsensical number and following that would give you the previous iteration’s value.
Now let’s say we don’t know when the error hits. We do know that we end up in the else block of our code so let’s toggle off the breakpoint on line 21 and put the breakpoint instead on line 14, within the else block. This saves us time because we don’t have to loop over and over through each iteration.
Now when we run debug50 test we get this:
When we hit the else block mynumber is -2147063296, a negative number. So going back to the terminal error message, signed integer overflow, when we originally ran our test program, it seems like we’ve exceeded the bounds of what an integer data type can handle. So since it can’t go any higher it wraps back to a negative value and up from there. To solve this issue we change all our relevant int types to long long types.
Now it works!
Strings, Arrays
The cs50 c library has several functions to get user input. These are:
get_char
get_double
get_float
get_int
get_long_long
get_string
%c - characters
%f - floats
%i - integer (%d also works)
%lld - long long
%s - string
If you want more detailed info about any cs50 function you can type man and the function name to get a fairly technical document.
If the info in the man pages aren’t helpful, and are more confusing than anything else, try the cs50 reference website.
It’s worth mentioning C doesn’t have a native string data type. The cs50 library has created this abstraction which represents a series of characters in memory.
Here is a code example to demonstrate:
The output to the terminal:
Some things to note, strlen is a function provided in the string.h library. It will return the length of characters in a sequence of them. So if my name is “John” it will return 4. There is a special character at the end of these sequences of characters, or strings, called a null character. It is represented like this: ‘\0′.
So the string “John” is an array of 4 chars (characters) and a null character. As a whole this would be called a Null-terminated string.
In the code you can also see we have the user’s name held in a string data type but printf outputs a character data type. This is known as type casting. We can go further and product the integer value for each character as well which maps to ascii to give us the character.
Since we have this fine control on how each character is printed to the screen, let’s try a small challenge. If a character is given that isn’t a capital letter let’s capitalize it before it is printed. How can we check for this?
If I use the expression
char[] x = "J";
I end up with a null terminated string, which means there is a hidden ‘\0’ you don’t see. Also keep in mind this is an array of type char. Now it is very subtle but if I use single quotes with a variable of type char…
char x = 'J';
Now I get just the ‘J’ character by itself, and also this is the integer 74 which is the integer that maps to that character. I can actually get K by just adding 1 to ‘J’.
printf("%c", 'J' + 1); // will output K
So back to the problem at hand. Here is the ascii map webpage I use to figure this kind of stuff out. We need to say:
given name[i] character in our loop test if lower case (what is the range of lowercase #'s?) if it is make it uppercase (integer distance from a to A?)
First let’s solve our conditional test. The ascii table shows us that lower case ‘a’ starts at integer value 97 and ends with ‘z’ at integer value 122.
char current_value = name[i] bool is_lower_case = (name[i] >= 'a') && (name[i] <= 'z') // bool is <cs50.h> data type for boolean. Int data type // will work but not as verbose
Between lower case and upper case ascii chars are some other characters like ‘[‘ for brackets for example, so that can make it a little daunting at first but it’s not so bad when you convert everything to their number values.
For example,
‘a’ is 97
‘A’ is 65
If I said to you, “hey if I give you 97, return back 65,” you would just subtract some number from 97. So hopefully you know where this is going.
int magic_number = 97 - 65; // 32
Not the answer to life the universe and everything but now we know, subtract 32 from a lower case character to make it upper, and add 32 to an upper case character to make it lower. So let’s write this program up: