Week 1 Monday#

Welcome to Math 10!

Canvas homepage

This class is an introduction to using Python for data science. I think of this class as having two main parts:

  • Part 1. Exploratory Data Analysis. (Weeks 1-5)

  • Part 2. Introduction to Machine Learning. (Weeks 5-10)

Two in-class midterms: Monday Week 5 and Friday Week 9 (see Canvas homepage for dates). They’re closed book and closed computer, but you will be allowed to use a notecard with handwritten notes on it.

There’s no final exam; instead there is a class project.

Announcements#

  • The first worksheet will be distributed today and we’ll have time to work on it, the second will be distributed on Wednesday. Both due Monday night of Week 2. In general, two worksheets per week.

  • Today’s worksheet is meant to get acquainted with the Deepnote programming environment, and to review/introduce some Python topics that you may have seen in Math 9.

  • One of our three LAs, Maya, is here to help with the worksheet and in general with getting set up in Deepnote.

  • If you’re on the waitlist, please submit worksheets/take quizzes on the same schedule as the regular class. (Assignments won’t be excused, but I do drop the two lowest worksheet scores.)

Warm-up with Deepnote and some Python concepts#

Suggestion. You can duplicate this project on the left side and then type along with me.

The notes from this notebook will be posted sometime later today in the course notes.

a = 10
print(a)
10

You execute a cell in Deepnote (or in a Jupyter notebook) by holding down shift and hitting return. The order in which you execute cells is important (not the order in which they appear on the page).

  • Using NumPy, make a length-1000 array of random real numbers between 0 and 10.

NumPy is one of the most important libraries for Python, and it shares many similarities with Matlab. Because NumPy is not automatically included in Python, we need to import it. The convention is to use the abbreviation np, and we will always follow that convention.

import numpy as np

Many aspects of Python follow what is called “object-oriented” programming. As an example of this style, we create a “random number generator” object which will be used to generate random numbers. The rng on the left is a variable name; we could have used any other variable name. The part on the right side must be written exactly (including the lack of capital letters and the parentheses).

rng = np.random.default_rng()

Here we generate 5 random numbers between 0 and 1. If you run this code (or I run it again), different numbers will show up.

rng.random(5)
array([0.0636005 , 0.85960075, 0.68093027, 0.87971363, 0.5778486 ])

The random that came after the period is what’s known as a method of the random number generator object rng. Here is another method, integers, that produces random integers.

rng.integers(0,20,size=5)
array([ 2, 14,  9,  8,  1])

We wanted 1000 numbers (not 5), and we wanted them to be between 0 and 10 (not between 0 and 1). We multiply by 10 to get the scale correct.

arr = 10*rng.random(1000)

Python does not have as many built-in functions as for example Matlab or Mathematica. (Most Python functions we use will come from an external library, like NumPy.) But len and max are two examples of built-in Python functions.

len(arr)
1000
max(arr)
9.996904603689662
  • What proportion of elements in this array are strictly greater than 9?

Here we look at the first 10 elements.

arr[:10]
array([3.11851835, 3.90914999, 2.61168215, 7.47896289, 9.45087702,
       9.91738752, 8.55094172, 1.35599358, 4.39273886, 2.4713273 ])

This is confusing at first, but notice how we can check which elements are strictly greater than 9. (This same syntax works in Matlab.)

arr[:10] > 9
array([False, False, False, False,  True,  True, False, False, False,
       False])

True is treated like 1 and False is treated like 0, so if we sum up these values, it is the same as counting how many times True occurs.

sum(arr[:10] > 9)
2

We would expect about 10% of the numbers to be greater than 9, and indeed, we are pretty close in this case.

sum(arr > 9)/len(arr)
0.102
  • Convert the array to a list and answer the same question.

Here is how we convert to a list. Be careful not to name the variable list (instead, use a more random name, like mylist), otherwise you could delete the built-in Python data type list.

mylist = list(arr)

Time to work on Worksheet 1#

  • For this worksheet only, to get full points, you need to work in a group a 2-3 total students.

  • One of Deepnote’s biggest strengths is that it works great for multiple people editing a file simultaneously. That’s why I want you to try out working in a group for this first worksheet, and I hope you will keep working in groups throughout the course. (The course project at the end is an individual project.)

  • Maya (one of the three course LAs) and I are here to help.

  • I have another class at 2pm, so if you have questions, ask me now rather than after class!