UC Irvine, Math 10, Winter 2022

UC Irvine, Math 10, Winter 2022

Introduction to Programming for Data Science

Use the Navigation menu on the left to find the course content. More material will be posted throughout the course.

Course-level Learning Outcomes

The goal of this course is to introduce programming in Python, with an emphasis on some of the tools that are most relevant to data science. The primary learning outcomes for Math 10 are that students will be able to:

  • select appropriate data types (both built-in Python types as well as types defined in external libraries) when performing computations;

  • apply tools from unfamiliar Python libraries using some combination of documentation, error messages, and code examples written by experienced programmers;

  • write code which is Pythonic (for example, avoiding unnecessary for loops) and adheres to the DRY (Don’t Repeat Yourself) principle;

  • given an unfamiliar dataset, apply techniques of Exploratory Data Analysis (EDA) to gain a rapid first-impression of the dataset’s contents;

  • manipulate structured data using NumPy and pandas;

  • produce interactive visualizations conveying significant aspects of datasets using Altair;

  • illustrate the structure of a neural network and the various components involved in the training of a neural network, and implement the procedure using PyTorch;

  • improve the performance of various machine learning algorithms by adjusting parameters within scikit-learn and PyTorch;

  • recognize the potential for overfitting when applying a machine learning algorithm;

  • create an educational, data-focused Jupyter notebook or Deepnote notebook using a combination of code cells and explanatory markdown cells.

References

I originally duplicated the University of Washington’s Visualization Curriculum. I thank the authors for making this material available not just to use, but to edit via their GitHub source code.

Datasets

Here is information about the cars dataset. I took this dataset from the Vega Datasets Python library.

The unemployment dataset was adapted from Vega Datasets. Some columns were removed and some other columns were renamed.

The Spotify dataset was originally taken from (a possibly earlier version of) this Kaggle dataset that was uploaded by Sashank Pillai.