Week 8, Tuesday Discussion
Contents
Week 8, Tuesday Discussion¶
Reminders:
Midterm 2 this Thursday during discussion
I have extra notecards if you need one – just ask me!
Announcements
Video quizzes will be locked tonight at 11:59 pm so that solutions can be released. Be sure to complete any quizzes that you are missing, even if they are from Week 1!
Today we will get through as much of the practice midterm as we can. Full solutions will be posted after discussion today.
Midterm 2 Review¶
#These will be useful for multiple problems, so let's import them now
import seaborn as sns
import pandas as pd
import altair as alt
import numpy as np
Problem 1¶
# Set up for the problem; pretend you don't know what mylist is!
mylist = [13.2, -10, -4.5, 1.2]
c0,c1,c2,c3 = mylist
f = lambda x: c3*x**3 + c2*x**2 + c1*x + c0
df = pd.DataFrame({"x": np.arange(-4,4,0.2)})
df["y"] = df["x"].map(f)
alt.Chart(df).mark_circle().encode(
x="x",
y="y"
)
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
cols = []
for deg in range(1,4):
c = f"d{deg}"
cols.append(c)
df[c] = df["x"]**deg
reg.fit(df[cols],df["y"])
LinearRegression()
ans2 = reg.coef_[1]
ans2
-4.499999999999999
ans0 = reg.intercept_
ans0
13.199999999999996
Problem 2¶
Problem 3¶
from sklearn.datasets import fetch_openml
from sklearn.cluster import KMeans
mnist = fetch_openml('mnist_784', version = 1)
df = pd.DataFrame(mnist.data)
kmeans = KMeans(n_clusters=14)
kmeans.fit(df)
KMeans(n_clusters=14)
df["cluster"] = kmeans.predict(df)
(c)
#Create a column called "label" that stores the true value of each image
df["label"] = pd.DataFrame(mnist.target)
for cluster, df_sub in df.groupby("cluster"):
num5s = (df_sub["label"] == "5").sum()
print(f"The number of 5s in cluster {cluster} is {num5s}")
The number of 5s in cluster 0 is 3
The number of 5s in cluster 1 is 128
The number of 5s in cluster 2 is 1203
The number of 5s in cluster 3 is 45
The number of 5s in cluster 4 is 36
The number of 5s in cluster 5 is 145
The number of 5s in cluster 6 is 804
The number of 5s in cluster 7 is 318
The number of 5s in cluster 8 is 185
The number of 5s in cluster 9 is 70
The number of 5s in cluster 10 is 1499
The number of 5s in cluster 11 is 180
The number of 5s in cluster 12 is 7
The number of 5s in cluster 13 is 1690
(d)/(e)
import matplotlib.pyplot as plt
kmeans.cluster_centers_.shape
A = kmeans.cluster_centers_[0]
A = A.reshape(28,28)
fig,ax = plt.subplots()
ax.imshow(A)
<matplotlib.image.AxesImage at 0x7f8fd8111650>