Tentative schedule

Class 1

Introduction to the class, to Python, and to Deepnote. Reading data from a csv file. Introduction to pandas, especially pandas DataFrames. Introduction to indexing in pandas using loc and iloc.

Class 2

Introduction to range, lists, for loops, list comprehension, and f-strings. Making a for-loop more Pythonic. Making repetitive code more Dry.

Class 3

Performing basic Exploratory Data Analysis (EDA) on a dataset using Python (and pandas in particular). Slicing.

Class 4

Different options for using the syntax df[???] in pandas. A closer look at Boolean indexing. Working with missing data. pandas Series and Python dictionaries. Very brief introduction to NumPy. Logic in pandas. Using the axis keyword argument with any and sum.

Class 5

Introduction to plotting in Python. Matplotlib as the most famous plotting library in Python. More practice with NumPy. Altair, Seaborn, and Plotly as three plotting libraries which are similar to each other but quite different from Matplotlib (Seaborn is built on top of Matplotlib). For Math 10, Altair will be the most important of these libraries.

Class 6

Practice with Altair and pandas. Using isin, value_counts, index, and slicing to find the most frequently occurring values in a pandas Series.

Class 7

Functions and lambda functions. Slicing. map in pandas. Basic logic in NumPy: and, or, not and in pandas and NumPy: &, |, ~.

Class 8

Working with dates in pandas. How to locate missing values. try and except for handling errors. Practice with if, elif, else. Using count with a list or a string. Using slicing with a string. An example of feature engineering.

Class 9

No new material; time to work on the homework and sample midterm.

Class 10

Timing various ways of counting, including NumPy’s count_nonzero. Sorting a pandas Series or DataFrame using sort_values. Timing operations, such as different ways to count objects.

Class 11

Review for the midterm, and some new topics in Altair.

Thursday, Week 4

Midterm 1 during discussion section.

Class 12

Different ways to rescale data: using a for loop, using apply, and using StandardScaler from scikit-learn. This is our first time using anything from scikit-learn, and the approach can feel unusual at first (scikit-learn is very “object oriented”). This StandardScaler preprocessing tool will have many similarities to scikit-learn Machine Learning tools we will cover later.

Class 13

K-Means clustering

Leftovers

Dictionary comprehension. Feature engineering. Introduction to NumPy. How the choice of data type influences the speed of various operations. More practice with NumPy and pandas. NumPy where, and pandas DataFrame styling. More advanced/leftover topics from NumPy, pandas, and Altair. pandas DataFrame method applymap. Using an Altair selection object in a condition. Introduction to Machine Learning and scikit-learn. K-Nearest Neighbors classification and K-Nearest Neighbors regression. Loss functions (also called cost functions). More on K-Nearest Neighbors and implementing it using scikit-learn.

Class 14

Detecting overfitting using a test set. The frequently occurring U-shaped test error curve. The bias-variance tradeoff. The notion of a decision boundary.

Class 15

Feature engineering using pandas. Datetime objects in pandas.

Class 16

Linear regression using scikit-learn.

Class 17

Polynomial regression using scikit-learn.

Class 18

More on overfitting and the bias-variance tradeoff.

Class 19

Logistic regression using scikit-learn.

Class 20

Why is logistic regression considered a linear model?

Class 21

Extended example: MNIST handwritten digit dataset using logistic regression. Brief introduction to Matplotlib.

Class 22

More on MNIST.

Class 23

Review

Thursday, Week 8

Midterm 2 during discussion section.

Class 24

Introduction to the Final Project.

Class 25

Introduction to tree-based models

Class 26

Tree-based models: random forests

Class 27

Extended example using random forests.

Class 28

Continuation of the example.

Class 29

Time to work on the final project.