Homework 3
Contents
Homework 3¶
Remark: This might not reflect updates, so check the Deepnote file for the official version.
List your name and the names of any collaborators at the top of this notebook.
(Reminder: It’s encouraged to work together; you can even submit the exact same homework as another student or two students, but you must list each other’s names at the top.)
Practice with the Cars dataset¶
These exercises refer to the cars dataset.
Load the data from the
cars.csv
file usingpd.read_csv
.Use this method and a Python dictionary to rename the
Miles_per_Gallon
column asmpg
and theWeight_in_lbs
column asWeight
.Which
Year
appears in the dataset least often?How many distinct values occur in the
Name
column?Using this method, create a correlation matrix for the (numeric) columns in the cars dataset. Which two columns are most negatively correlated?
Make a dictionary whose keys are USA, Japan, and Europe, and whose values are the average weight in pounds of the cars in the dataset from that origin. (So for example, the value corresponding to USA, would be the average weight of all cars in the dataset whose origin is USA.)
Make the same thing (weights for USA, Japan, and Europe) as a pandas Series. (One option is to convert the dictionary you made into a pandas Series.)
In-class quiz practice¶
(This week’s in-class quiz may involve instructions like this.)
Define a pandas DataFrame df
using the following code. (Be sure to put all of these lines in the same cell; otherwise, you might get different answers. The important thing is that X
and A
get created immediately after rng
.)
import numpy as np
import pandas as pd
rng = np.random.default_rng(seed=12)
X = np.concatenate(([np.nan],np.arange(100)))
A = rng.choice(X,size=(10**5,5))
df = pd.DataFrame(A,columns=list("abcde"))
How many null values are there in the
c
column ofdf
?How many of the values in the
b
column are equal to 49 or to 50?Consider the sub-DataFrame consisting of all the rows in which the
b
column is strictly greater than 95 and thec
column is less than 5. What is the average value in thea
column for the resulting DataFrame? Enter your answer correct to two decimal places.How many of the rows of
df
contain the number 0? (Warning. This is not the same as asking how many 0s occur in the DataFrame. Usesum
,any
,axis
, and a Boolean DataFrame.)
Submission¶
Download the .ipynb file for this notebook (click on the folder icon to the left, then the … next to the file name) and upload the file on Canvas.
Created in Deepnote