Homework 3

Remark: This might not reflect updates, so check the Deepnote file for the official version.

List your name and the names of any collaborators at the top of this notebook.

(Reminder: It’s encouraged to work together; you can even submit the exact same homework as another student or two students, but you must list each other’s names at the top.)

Practice with the Cars dataset

These exercises refer to the cars dataset.

  1. Load the data from the cars.csv file using pd.read_csv.

  2. Use this method and a Python dictionary to rename the Miles_per_Gallon column as mpg and the Weight_in_lbs column as Weight.

  3. Which Year appears in the dataset least often?

  4. How many distinct values occur in the Name column?

  5. Using this method, create a correlation matrix for the (numeric) columns in the cars dataset. Which two columns are most negatively correlated?

  6. Make a dictionary whose keys are USA, Japan, and Europe, and whose values are the average weight in pounds of the cars in the dataset from that origin. (So for example, the value corresponding to USA, would be the average weight of all cars in the dataset whose origin is USA.)

  7. Make the same thing (weights for USA, Japan, and Europe) as a pandas Series. (One option is to convert the dictionary you made into a pandas Series.)

In-class quiz practice

(This week’s in-class quiz may involve instructions like this.)

Define a pandas DataFrame df using the following code. (Be sure to put all of these lines in the same cell; otherwise, you might get different answers. The important thing is that X and A get created immediately after rng.)

import numpy as np
import pandas as pd
rng = np.random.default_rng(seed=12)
X = np.concatenate(([np.nan],np.arange(100)))
A = rng.choice(X,size=(10**5,5))
df = pd.DataFrame(A,columns=list("abcde"))
  1. How many null values are there in the c column of df?

  2. How many of the values in the b column are equal to 49 or to 50?

  3. Consider the sub-DataFrame consisting of all the rows in which the b column is strictly greater than 95 and the c column is less than 5. What is the average value in the a column for the resulting DataFrame? Enter your answer correct to two decimal places.

  4. How many of the rows of df contain the number 0? (Warning. This is not the same as asking how many 0s occur in the DataFrame. Use sum, any, axis, and a Boolean DataFrame.)

Submission

Download the .ipynb file for this notebook (click on the folder icon to the left, then the … next to the file name) and upload the file on Canvas.

Created in deepnote.com Created in Deepnote