Week 8, Tuesday Discussion


  • Midterm 2 this Thursday during discussion

  • I have extra notecards if you need one – just ask me!


  • Video quizzes will be locked tonight at 11:59 pm so that solutions can be released. Be sure to complete any quizzes that you are missing, even if they are from Week 1!

  • Today we will get through as much of the practice midterm as we can. Full solutions will be posted after discussion today.

Midterm 2 Review

#These will be useful for multiple problems, so let's import them now
import seaborn as sns
import pandas as pd
import altair as alt
import numpy as np

Problem 1

# Set up for the problem; pretend you don't know what mylist is!
mylist = [13.2, -10, -4.5, 1.2]
c0,c1,c2,c3 = mylist
f = lambda x: c3*x**3 + c2*x**2 + c1*x + c0
df = pd.DataFrame({"x": np.arange(-4,4,0.2)})
df["y"] = df["x"].map(f)

from sklearn.linear_model import LinearRegression
reg = LinearRegression()
cols = []
for deg in range(1,4):
    c = f"d{deg}"
    df[c] = df["x"]**deg
ans2 = reg.coef_[1]
ans0 = reg.intercept_

Problem 2

Problem 3

from sklearn.datasets import fetch_openml
from sklearn.cluster import KMeans
mnist = fetch_openml('mnist_784', version = 1)
df = pd.DataFrame(mnist.data)
kmeans = KMeans(n_clusters=14)
df["cluster"] = kmeans.predict(df)


#Create a column called "label" that stores the true value of each image
df["label"] = pd.DataFrame(mnist.target)
for cluster, df_sub in df.groupby("cluster"):
    num5s = (df_sub["label"] == "5").sum()
    print(f"The number of 5s in cluster {cluster} is {num5s}")
The number of 5s in cluster 0 is 3
The number of 5s in cluster 1 is 128
The number of 5s in cluster 2 is 1203
The number of 5s in cluster 3 is 45
The number of 5s in cluster 4 is 36
The number of 5s in cluster 5 is 145
The number of 5s in cluster 6 is 804
The number of 5s in cluster 7 is 318
The number of 5s in cluster 8 is 185
The number of 5s in cluster 9 is 70
The number of 5s in cluster 10 is 1499
The number of 5s in cluster 11 is 180
The number of 5s in cluster 12 is 7
The number of 5s in cluster 13 is 1690


import matplotlib.pyplot as plt
A = kmeans.cluster_centers_[0]
A = A.reshape(28,28)

fig,ax = plt.subplots()
Problem 4