Homework 3
Contents
Homework 3¶
Remark: This might not reflect updates, so check the Deepnote file for the official version.
List your name and the names of any collaborators at the top of this notebook.
(Reminder: It’s encouraged to work together; you can even submit the exact same homework as another student or two students, but you must list each other’s names at the top.)
Practice with the Cars dataset¶
These exercises refer to the cars dataset.
Load the data from the
cars.csvfile usingpd.read_csv.Use this method and a Python dictionary to rename the
Miles_per_Galloncolumn asmpgand theWeight_in_lbscolumn asWeight.Which
Yearappears in the dataset least often?How many distinct values occur in the
Namecolumn?Using this method, create a correlation matrix for the (numeric) columns in the cars dataset. Which two columns are most negatively correlated?
Make a dictionary whose keys are USA, Japan, and Europe, and whose values are the average weight in pounds of the cars in the dataset from that origin. (So for example, the value corresponding to USA, would be the average weight of all cars in the dataset whose origin is USA.)
Make the same thing (weights for USA, Japan, and Europe) as a pandas Series. (One option is to convert the dictionary you made into a pandas Series.)
In-class quiz practice¶
(This week’s in-class quiz may involve instructions like this.)
Define a pandas DataFrame df using the following code. (Be sure to put all of these lines in the same cell; otherwise, you might get different answers. The important thing is that X and A get created immediately after rng.)
import numpy as np
import pandas as pd
rng = np.random.default_rng(seed=12)
X = np.concatenate(([np.nan],np.arange(100)))
A = rng.choice(X,size=(10**5,5))
df = pd.DataFrame(A,columns=list("abcde"))
How many null values are there in the
ccolumn ofdf?How many of the values in the
bcolumn are equal to 49 or to 50?Consider the sub-DataFrame consisting of all the rows in which the
bcolumn is strictly greater than 95 and theccolumn is less than 5. What is the average value in theacolumn for the resulting DataFrame? Enter your answer correct to two decimal places.How many of the rows of
dfcontain the number 0? (Warning. This is not the same as asking how many 0s occur in the DataFrame. Usesum,any,axis, and a Boolean DataFrame.)
Submission¶
Download the .ipynb file for this notebook (click on the folder icon to the left, then the … next to the file name) and upload the file on Canvas.