CS50 Week 3

Note: I’ve added a lot of code examples in this article towards the end that deal with the resize less comfortable problem that I found difficult to understand. Use a C repl online to copy and paste the github gists (code examples) i’ve included and by understanding how they work you’ll be one step closer to making sense of resize.c!

New format…

Please comment if you have any questions and i’ll do my best to respond with a detailed answer to help clear anything up. I’m noticing these problems went from wow that was difficult, to wow what is the meaning of life how could anyone solve these problems in a week?!

Padding question

The readme had me stumped on this one:

What value does line 65 of `copy.c` assign to `padding` 
if `bi.biWidth` is `3`?

Here is the code in question:

// determine padding for scanlines
int padding = (4 - (bi.biWidth * sizeof(RGBTRIPLE)) % 4) % 4;

So broken down:

We're given 3 for the width and a rgb triple is 3 as well:

int padding = (4 - (3 * 3) % 4) % 4;

// 3 * 3 = 9

int padding = (4 - (9) % 4) % 4;

// 9 % 4 = 1

int padding = (4 - 1) % 4;

// 4 - 1 = 3

int padding = 3 % 4;

// 3 % 4 = 3

int padding = 3;

Does modulo still confuse you? Whenever I step away for a couple of days I lose my “modulo mojo.”

Why does 3 % 4 = 3

because mod gives us the remainder of a division operation.

Look at the expression 8 % 4 for example.

Imagine the 4 hits the 8 and gives 4 damage on the 8. So that 4 can hit 8 twice before there is nothing left.

If the first number was 9 there would be 1 left. That’s what mod gives us, whatever is left after the second number hits the first number.

So what about 3 % 4?

Well 4 can’t even fight 3 because it can’t get at least 4 damage out of it. So the fight never happens and modulo gives us what is left in the first number which is the same as what we started with which is 3!

Whodunit, struct and typedef explained

Although it can be uncomfortable at first to wade through someone else’s file of code, this was a little easier to solve than last weeks helper functions. Remember the lecture speaks about file pointers and modeling our data using structs.

Here is a simple program to help demystify things:

I’ll show my answer and explain what is going on (starts on line 80):

In our loop we’re streaming in one pixel of RGB at a time. To access the value of the current pixel we can dive into our struct’s properties. To access a property we take the struct, which in this care is triple, add a dot and then our property:

triple.rgbtRed

So we can print this value out or run a test. I wanted to remove all red pixels so I created a branch within the loop between the fread and fwrite functions. This is the perfect place to make modifications because it’s right before we write into the new file!

I decided my test would be if a pixel had any full red value. In hex thats 0x0000ff.

If it did I would take that pixel and black it out on all channels. We can do that by taking each property and assigning it the value of 0x000000, which is black.

GDB to the rescue

Working through resize more less comfortable right now. Figured I would give the more challenging problem a try. After watching the video i’m faced with resize.c and bmp.h feeling very confused as to where I should start.

Try this:

  1. make your copy of the file copy.c (I named mine resize.c) and use make
  2. run the command gdb resize
  3. set a break point after the header variables are created and read into (this was around line 45) using the command b 45
  4. run the command r <infile.bmp> <outfile.bmp> (using your own bmp files)
  5. now take a look at where you are in the code by using the command list
  6. finally print out either the bf or bi structs using the command print bf or print bi

Now you can see the insides of those variables and what they are made up of when read. Try printing bf.bfSize etc. It makes a huge difference in my comfort level and I didn’t have to write a single print statement!

Understanding File I/O in C

File I/O operations that occur in resize less comfortable. We have structs being created from a header file, some pointers getting thrown around and what the heck is that nested for loop monster doing at the bottom? Weren’t we support to write neat little functions for everything?

Once you understand what is going on in each of these simple programs, write it from scratch yourself. Change the text, add a conditional if you want to be adventurous. Maybe if somewhere there is the character ‘s’ switch it to ‘z’. Make it your own to really learn what is going on. Each time you run make and there is an error, solve it and be amazed at your growth as a programmer!

Creating a new file:

Reading in a file:

Visualize file I/O in C

FILE * fp = fopen("stuff.txt", "r");

The FILE data type is known as an opaque data type so the details are hidden. You should also know FILE is a struct which means this creates an object. For now we need to know this is the data type when dealing with files. Simple enough for now…

Next we have the * fp. So we know that fp is a pointer which holds the address where our file lives. But what if we don’t have a file yet? Well we don’t quite have access to an address space but a “stream.” The fp becomes a “file handler.”

We take for granted how simple it is to save files on our computer. We don’t think for a second about it. Back in the day this was an issue. Programs store data to memory (RAM) while they’re running, and if there is enough memory, but once stopped (or shut off from power) programs lose everything stored on the computers memory. In order to save data after programs stopped they needed a completely different way of storing data. Files became the way, and needed a way to be HANDLED.

Memory (RAM) is usually a long chip with squares. File storage devices are not all alike. DVD’s, SD cards,HDD’s and SSD’s are all storage devices that you find in modern computers today. But it did not start out that way at all!

In the past, data was stored on giant spinning disk machines. There were programmers that had to implement code that could save data to these.

Let’s create a pseudo function for that:

saveDataOnSpinningDisk(){
  // lots of programming
}

It’s 1965 and your manager comes in and tells you “They have these new tape machines that store 100 times the data in the same amount of space! we have to write code to save data to these new data storage machines!”

All our programs (think thousands of lines of code) have a function called saveDataOnSpinningDisk().

So what do we do? Maybe we have an if statement like this:

if(using old data storage method){
   saveDataOnSpinningDisk()
}else{
   saveDataOnNewTapeMachine()
}

A few years later the HDD (Hard Drive Disk) is invented, and a basement full of tape machines can be stored in the palm of your hand. Management is telling you to write more code to HANDLE FILES using this new method of storage. Soon CD’s DVD’s Blue-ray discs, SD cards…you get the picture.

Eventually people figured out data storage could be abstracted out to their basic operations. All of these storage devices can be written, read, appended to, deleted, and created. By keeping the specific implementation behind the scenes, our programs only have to deal with an abstraction called a FILE HANDLER, that handles these I/O operations for us.

The C programming language is taking care of the file I/O specific to our system for us. It knows we’re on linux ubuntu, on the CS50 cloud server and does the file I/O needed for the system. Many digital cameras use C, which means the same functions you write to manipulate file I/O can be used to store data on an SD card. This keeps you focused on the program at hand, not the distracting details of file I/O.

I finally got through week 3

It really be like that though

Honestly,  it didn’t take 8 weeks. It took probably a week of really focused time to complete this problem. Specifically “less comfortable.”

I’ve created a few simple programs that helped me understand each part of the program resize.c, so it wouldn’t feel so overwhelming when trying to tackle in one pass.

Not sure how to create a file or make a copy? This little program might help:

Not sure what the original copy.c is doing in the problem set? Try going through my version and see if it makes more sense. I added a lot of helpful comments.

Things in less comfortable that sucked

I made many attempts but ultimately went on GitHub to find other programers solutions. My end goal was to understand the code. Reading it line by line I was able to finally understand what had to be done to complete the challenge. I’ll put my solution below and we’ll go through each section:

Validating inputs

Up to line 59 we’re just assigning the user inputs to clearly named variables and making sure our file streams are valid and working. On line 22 we have to use the enlargement factor argument as an int data type so using atoi().

The part on line 51 about making sure it’s a 24-bit uncompressed BMP 4.0 is a detail we don’t have to worry about and are just copying from copy.c.

Update the headers

This covers all the way to line 87. We’ve got 2 headers, the BITMAPFILEHEADER and the BITMAPINFOHEADER. Read and write in the correct order! I kept bmp.h open in another tab in the cs50 ide to use as a reference on how the structs are ordered and what data each is holding.

It’s best to keep the infile data and the outfile data separate so I created variables for the infile headers (bf and bi) and variables for the outfile headers (obf and obi).

So the video reading less comfortable tells us all the things we have to update which are:

  • width & height
  • biSizeImage
  • bfSize

Getting the values in this order is key because each is dependent on the former being updated.

Once all the updates are complete we use fwrite() to push that information into our outfile.

After each read and write operation, the infile pointer and outfile pointer is moving forward. Imagine a youtube player and when we read or write the little playhead ball is moving ahead.

The difficult part

Starting from line 91 we get caught up in a lot of nested loops.

First we have to think (whiteboards are very helpful here) about what we’re doing to the RGBTRIPLE in the loop. Let’s draw it out:

before a resize with a resize factor of 2:

[] []~

[] []~

after a resize:

[] [] [] []~

[] [] [] []~

[] [] [] []~

[] [] [] []~

([] represents one RGBTRIPLE)

(~ is padding and the values can be different)

When we read in a single RGBTRIPLE, we have to write it out n times. Whatever factor the user gives us. The tough part comes into play when we try to duplicate the rows.

Notice how the after drawing has 4 rows instead of 2?

I’ve seen some solutions using fseek by factoring out the 54 bytes of the header and looping through, but it was confusing for me so I choose to use the malloc() solution.

We store the entire current row of RGBTRIPLE’s (pixels) into scanline AFTER we duplicate the number of pixels using a nested loop.

Then push them n times into our outfile.

So the inner loop collects what our new row looks like and the outer loop actually writes to the outfile an entire row at a time.

before:

[][]

after:

[][][][]

scanline takes in an entire row [][][][]

then pushes the row into the outfile in the nested loop plus padding factor  times

Back up more to line 91. We have a data type RGBTRIPLE but it’s a pointer type and we’re using malloc(). I was hesitant to use this for about 2 weeks.

Best way to learn is to practice with little programs one without malloc() and one with malloc():

So the above program is without malloc. We declare an array of chars size 3. We know the file is 3Max, 3 being our version of a “header” telling us the “width” is 3, or 3 chars. If we change the file 3Max to 3JohnnyYzaguirre we aren’t going to get the result we want because our variable holds a static number of chars.

We need something dynamic, which means something that can size itself  based on what is required.

Malloc() allocates memory by supplying it with a number of bytes. We assign this to a pointer of whatever data type we want to store.

By having the pointer of a data type (int, char, RGBTRIPLE etc), we can use pointer arithmetic, which allows us to move around the blocks of <datatype> memory very easily that we have allocated. *WARNING THIS IS NOT LIKE A FILE STREAM! THE POINTER DOES NOT MOVE ON TO THE NEXT AVAILABLE PIECE OF MEMORY WHEN YOU READ OR WRITE FROM IT!

Try out this program to see what I mean. Lots of helpful comments here:

resize less comfortable throws a lot at you so i’ll give you a few hints that I don’t feel I completely covered in the article.

  • learn to use fseek() by looking up the function arguments. It’s like a functional way of taking the playhead backwards or forwards. You only need to push it forwards in this challenge. Once you have the correct outpadding you are going to manually add it after you’ve pushed a scanline into the outfile.
  • More on the loop section: The main loop represents the rows of the infile and has 2 inner for loops. the first inner loop represents the individual pixels in a row. Just like in our example with Max, we take that temp pixel variable and write it over and over factor times using a nested for loop. the second inner loop is where we actual write the scanline to the outptr followed by another nested loop where the padding is manually added.
  • The scanline pointer_counter lives inside of the main loop. This is because we actually need it to reset to zero since we’re re-using the scanline for the next row from the infile.

Happy coding everyone! Until next time =)

2
Leave a Reply

avatar
1 Comment threads
1 Thread replies
2 Followers
 
Most reacted comment
Hottest comment thread
2 Comment authors
JohnnyVladimir Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
Vladimir
Guest
Vladimir

Any hints on the “recover” problem? I’m solving it for over 3 days now…