Tentative schedule
Contents
Tentative schedule¶
Class 1¶
Introduction to the class, to Python, and to Deepnote. Reading data from a csv file. Introduction to pandas, especially pandas DataFrames. Introduction to indexing in pandas using loc
and iloc
.
Class 2¶
Introduction to range, lists, for loops, list comprehension, and f-strings. Making a for-loop more Pythonic. Making repetitive code more Dry.
Class 3¶
Performing basic Exploratory Data Analysis (EDA) on a dataset using Python (and pandas in particular). Slicing.
Class 4¶
Different options for using the syntax df[???]
in pandas. A closer look at Boolean indexing. Working with missing data. pandas Series and Python dictionaries. Very brief introduction to NumPy. Logic in pandas. Using the axis
keyword argument with any
and sum
.
Class 5¶
Introduction to plotting in Python. Matplotlib as the most famous plotting library in Python. More practice with NumPy. Altair, Seaborn, and Plotly as three plotting libraries which are similar to each other but quite different from Matplotlib (Seaborn is built on top of Matplotlib). For Math 10, Altair will be the most important of these libraries.
Class 6¶
Practice with Altair and pandas. Using isin
, value_counts
, index
, and slicing to find the most frequently occurring values in a pandas Series.
Class 7¶
Functions and lambda functions. Slicing. map
in pandas. Basic logic in NumPy: and
, or
, not
and in pandas and NumPy: &
, |
, ~
.
Class 8¶
Working with dates in pandas. How to locate missing values. try
and except
for handling errors. Practice with if
, elif
, else
. Using count
with a list or a string. Using slicing with a string. An example of feature engineering.
Class 9¶
No new material; time to work on the homework and sample midterm.
Class 10¶
Timing various ways of counting, including NumPy’s count_nonzero
. Sorting a pandas Series or DataFrame using sort_values
. Timing operations, such as different ways to count objects.
Class 11¶
Review for the midterm, and some new topics in Altair.
Thursday, Week 4¶
Midterm 1 during discussion section.
Class 12¶
Different ways to rescale data: using a for loop, using apply
, and using StandardScaler
from scikit-learn. This is our first time using anything from scikit-learn, and the approach can feel unusual at first (scikit-learn is very “object oriented”). This StandardScaler
preprocessing tool will have many similarities to scikit-learn Machine Learning tools we will cover later.
Class 13¶
K-Means clustering
Leftovers¶
Dictionary comprehension. Feature engineering. Introduction to NumPy. How the choice of data type influences the speed of various operations.
More practice with NumPy and pandas. NumPy where
, and pandas DataFrame styling. More advanced/leftover topics from NumPy, pandas, and Altair. pandas DataFrame method applymap
. Using an Altair selection
object in a condition
. Introduction to Machine Learning and scikit-learn. K-Nearest Neighbors classification and K-Nearest Neighbors regression. Loss functions (also called cost functions). More on K-Nearest Neighbors and implementing it using scikit-learn.
Class 14¶
Detecting overfitting using a test set. The frequently occurring U-shaped test error curve. The bias-variance tradeoff. The notion of a decision boundary.
Class 15¶
Feature engineering using pandas. Datetime objects in pandas.
Class 16¶
Linear regression using scikit-learn.
Class 17¶
Polynomial regression using scikit-learn.
Class 18¶
More on overfitting and the bias-variance tradeoff.
Class 19¶
Logistic regression using scikit-learn.
Class 20¶
Why is logistic regression considered a linear model?
Class 21¶
Extended example: MNIST handwritten digit dataset using logistic regression. Brief introduction to Matplotlib.
Class 22¶
More on MNIST.
Class 23¶
Review
Thursday, Week 8¶
Midterm 2 during discussion section.
Class 24¶
Introduction to the Final Project.
Class 25¶
Introduction to tree-based models
Class 26¶
Tree-based models: random forests
Class 27¶
Extended example using random forests.
Class 28¶
Continuation of the example.
Class 29¶
Time to work on the final project.