Week 8, Tuesday Discussion

Reminders:

  • Midterm 2 this Thursday during discussion

  • I have extra notecards if you need one – just ask me!

Announcements

  • Video quizzes will be locked tonight at 11:59 pm so that solutions can be released. Be sure to complete any quizzes that you are missing, even if they are from Week 1!

  • Today we will get through as much of the practice midterm as we can. Full solutions will be posted after discussion today.

Midterm 2 Review

#These will be useful for multiple problems, so let's import them now
import seaborn as sns
import pandas as pd
import altair as alt
import numpy as np

Problem 1

# Set up for the problem; pretend you don't know what mylist is!
mylist = [13.2, -10, -4.5, 1.2]
c0,c1,c2,c3 = mylist
f = lambda x: c3*x**3 + c2*x**2 + c1*x + c0
df = pd.DataFrame({"x": np.arange(-4,4,0.2)})
df["y"] = df["x"].map(f)

alt.Chart(df).mark_circle().encode(
     x="x",
     y="y"
)
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
cols = []
for deg in range(1,4):
    c = f"d{deg}"
    cols.append(c)
    df[c] = df["x"]**deg
reg.fit(df[cols],df["y"])
LinearRegression()
ans2 = reg.coef_[1]
ans2
-4.499999999999999
ans0 = reg.intercept_
ans0
13.199999999999996

Problem 2

Problem 3

from sklearn.datasets import fetch_openml
from sklearn.cluster import KMeans
mnist = fetch_openml('mnist_784', version = 1)
df = pd.DataFrame(mnist.data)
kmeans = KMeans(n_clusters=14)
kmeans.fit(df)
KMeans(n_clusters=14)
df["cluster"] = kmeans.predict(df)

(c)

#Create a column called "label" that stores the true value of each image
df["label"] = pd.DataFrame(mnist.target)
for cluster, df_sub in df.groupby("cluster"):
    num5s = (df_sub["label"] == "5").sum()
    print(f"The number of 5s in cluster {cluster} is {num5s}")
The number of 5s in cluster 0 is 3
The number of 5s in cluster 1 is 128
The number of 5s in cluster 2 is 1203
The number of 5s in cluster 3 is 45
The number of 5s in cluster 4 is 36
The number of 5s in cluster 5 is 145
The number of 5s in cluster 6 is 804
The number of 5s in cluster 7 is 318
The number of 5s in cluster 8 is 185
The number of 5s in cluster 9 is 70
The number of 5s in cluster 10 is 1499
The number of 5s in cluster 11 is 180
The number of 5s in cluster 12 is 7
The number of 5s in cluster 13 is 1690

(d)/(e)

import matplotlib.pyplot as plt
kmeans.cluster_centers_.shape
A = kmeans.cluster_centers_[0]
A = A.reshape(28,28)

fig,ax = plt.subplots()
ax.imshow(A)
<matplotlib.image.AxesImage at 0x7f8fd8111650>
../_images/Week8-Tuesday_22_1.png

Problem 4