Homework 1

(This file might not be updated, so go to Deepnote for the most current version.)

Author: BLANK

Collaborators: BLANK

References: BLANK

Question 1

  • Find a dataset on Kaggle that interests you and upload that to this project. (Choose a file in csv format, and make sure it is under 100MB. You will need to sign up for Kaggle to download the file. Some datasets are more clean and hence easier to work with than others, so don’t be afraid to switch datasets if you’re having trouble.)

  • Replace the “References: BLANK” markdown link above with a link the dataset you found. You should replace both the word “BLANK” as well as the url.

  • Change the name “Homework 1” above to a more descriptive title, related to your dataset. Leave the pound symbol #, which says that this is a top-level heading in markdown. (The headings like Question 1 are preceded by ##, which says they are second-level headings.)

  • Replace “Author: BLANK” with your name.

  • Replace “Collaborators: BLANK” with the names of any students you worked with. (It’s fine to share code in a group of up to 3 students, as long as you list everyone here. You can even turn in the exact same answers, as long as the author/collaborator names are correct.)

Question 2

  • Ask a question about your dataset, and use the describe() or info() method to answer it. (Your question and answer should be in markdown cells, while the code to answer the question should be in code cells.)

Question 3

  • Ask a question about your dataset, and use the value_counts() method to answer it.

Question 4

  • Ask a question about your dataset, and use Boolean indexing to answer it.

Question 5

  • Ask a question about your dataset, and use list comprehension to answer it.

Question 6

  • How many rows are there in your dataset? Give your answer in two different formats: in a markdown cell, where you write out the answer, and in a code cell, where you use an f-string. (For the f-string part, don’t just type something like num_rows = 10000, instead use something like num_rows = df.shape[?].)

Question 7

Rewrite the following code so it follows the DRY (Don’t Repeat Yourself) principle and so that it uses f-strings. Also move the code from this markdown cell into a code cell, so you can execute it.

col0 = df.iloc[:,0]
print("There are ")
print(col0.isna().sum())
print("missing values in column 0")

col1 = df.iloc[:,1]
print("There are ")
print(col1.isna().sum())
print("missing values in column 1")

col2 = df.iloc[:,2]
print("There are ")
print(col2.isna().sum())
print("missing values in column 2")

Submission

To submit this homework, go to the Share option at the top right, and share the project to create a link, and then submit that link on Canvas.