Quiz 2 Practice Exercises
Contents
Quiz 2 Practice Exercises¶
These questions relate to the learning objectives for Quiz 2.
For each of these topics, we state the learning objective and then give one or more practice exercises.
loc and iloc¶
Access entries (and assign values to entries) within pandas using loc, iloc.
Exercise:¶
Define a DataFrame df
using the following code. (Make sure to put all of these commands in the same cell.)
import numpy as np
import pandas as pd
rng = np.random.default_rng(seed=30)
A = rng.integers(-1,10,size=(10,5))
df = pd.DataFrame(A,columns=list("abcde"))
Write code to replace all the even-indexed (like 0, 2, 4, …) entries in the b
column with NumPy’s not a number, nan
.
Distribution of data¶
Use the describe, info, and value_counts functions in pandas to learn about the distribution of data within a DataFrame or Series.
Exercise¶
Which column in the above DataFrame has the smallest mean value? (Can you answer this question two different ways, first using one of the methods describe
, info
, or value_counts
, and second using the mean
method together with an axis
argument?)
Count occurrences¶
Count occurrences in rows or columns in a NumPy array or pandas DataFrame using sum and axis.
Exercise¶
How often does 5 occur in each row?
Select rows¶
Select the rows in a pandas DataFrame where a certain condition is True, possibly using some combination of any, all, and axis.
Exercise¶
Create a new pandas DataFrame using the below code. (Be sure to run all of this code, especially the rng
part, even if you already created a random number generator above.)
import numpy as np
import pandas as pd
rng = np.random.default_rng(seed=10)
A = 10*rng.random(size=(10**4,3))
df = pd.DataFrame(A,columns=["x","y","z"])
Create a new DataFrame df2
consisting of only the rows of df
where the x
-value is strictly greater than the y
-value. What is the maximum in the y
column of df2
? Enter your answer correct to two decimal places.
Count rows¶
Count the number of rows in a pandas DataFrame satisfying a condition.
Exercise¶
For the DataFrame df2
that you created above, in how many rows is the y
-value strictly greater than the z
-value? If you pick a random row from df2
, what is the probability that its y
-value is strictly greater than its z
-value?
Missing values¶
Locate the missing values within a pandas DataFrame.
Exercise¶
Write a function which takes as input a pandas DataFrame, and as output returns the sub-DataFrame with all rows removed which contained a null value. Possible approach: use the isna
method and logical negation.
Data types¶
Identify the different data types present within a pandas DataFrame.
Exercise¶
Read in the cars dataset using read_csv
. What are the different datatypes present? What does object
represent? If you look at the Horsepower
column, all the values seem to be integers, yet pandas represents the data type as float64
. Do you have a guess why that is? (Hint. It’s not that there is secretly some fractional horsepower somewhere in it. It relates to the previous learning objective.)
Logic¶
Find content satisfying conditions using logical statements (
not
,and
,or
) in Python and in pandas/NumPy (~
,&
,|
).
Exercise¶
What is the average weight of cars in the dataset that are from Europe or Japan? (Your answer should be a single number.)
Created in Deepnote