Week 1 Wednesday#

Announcements#

  • I’ll have office hours on Fridays 12-1pm in here (ALP 3610). I hope that will be a convenient time to come by and work before class, even if you don’t have specific questions!

  • I had some trouble with the Deepnote hardware resetting on Monday. I contacted Deepnote and they don’t expect that to continue. If we keep having problems, we will switch to working locally (not in the cloud) and using Jupyter notebooks directly.

  • The Monday file is posted in the course notes and some explanations were added.

  • Worksheet 2 distributed today.

  • Hanson (Songhan), one of our three LAs, is here to help.

  • If you’re stuck on something and not able to ask in person, try asking on Ed Discussion (linked from Canvas).

Boolean arrays and Boolean indexing#

import numpy as np
# Instantiate a random number generator object
rng = np.random.default_rng()

Here we use rng to make a length 10 NumPy array of random integers between 0 (inclusive) and 5 (exclusive).

arr = rng.integers(0, 5, size=10)
arr
array([1, 1, 1, 3, 2, 3, 3, 3, 0, 4])
  • How can we guarantee consistent (or reproducible) random integers?

rng = np.random.default_rng(seed=452023)

If you run the following cell, immediately after using the code above (in particular, using the same seed value), then you should see the same resulting array of random integers.

arr = rng.integers(0, 5, size=10)
arr
array([4, 3, 2, 3, 1, 0, 4, 0, 4, 2])

If we run the same code again, we will get new integers.

arr = rng.integers(0, 5, size=10)
arr
array([4, 0, 4, 0, 4, 4, 2, 0, 3, 4])

To get consistent results, it helps to put all of these lines into the same cell (using enter instead of shift+enter to create a new line). Notice that this is the same array that was produced first above using the seed=452023 keyword argument.

rng = np.random.default_rng(seed=452023)
arr = rng.integers(0, 5, size=10)
arr
array([4, 3, 2, 3, 1, 0, 4, 0, 4, 2])
  • Make a Boolean array indicating where the array is equal to 2.

Be sure you understand how these Boolean values correspond to the values in the array. Also, notice that we are using two equals signs, not one, to compare for elementwise equality. (One equals sign is for assignment.)

arr == 2
array([False, False,  True, False, False, False, False, False, False,
        True])
  • Count how many of these entries are equal to 2.

Because True is treated like 1 and False is treated like 0, we can count the number of True values (in this case, that is the number of 2 values in the original array) by using sum.

Here we use the built-in Python function sum.

sum(arr == 2)
2

Here we use the NumPy array method sum. (Methods and attributes come after the object, and are accessed using a period ..)

(arr == 2).sum()
2

Sometimes it is more elegant to save the intermediate values along the way, rather than copy-pasting. Here we save the Boolean array with the variable name ba. I don’t think the parentheses are necessary in this case, but they definitely make it more readable.

ba = (arr == 2)
ba
array([False, False,  True, False, False, False, False, False, False,
        True])

Notice that ba really is a NumPy array.

type(ba)
numpy.ndarray

Here is another example of how seed works. When we do the following without specifying a seed, we get different results every time.

rng2 = np.random.default_rng()
rng2.random(4)
array([0.10394648, 0.72812454, 0.53265183, 0.77371897])

Notice how the exact same code produces new results.

rng2 = np.random.default_rng()
rng2.random(4)
array([0.73731568, 0.53727985, 0.21285365, 0.04879829])

On the other hand, when we use a fixed seed keyword argument, we get the same output every time.

rng2 = np.random.default_rng(seed=40)
rng2.random(4)
array([0.7298985 , 0.69341496, 0.94192102, 0.05965206])

Here is that same output.

rng2 = np.random.default_rng(seed=40)
rng2.random(4)
array([0.7298985 , 0.69341496, 0.94192102, 0.05965206])

On the other hand, if we change to a different seed, we get a new output.

rng2 = np.random.default_rng(seed=400)
rng2.random(4)
array([0.19909881, 0.61184905, 0.97727673, 0.38164342])
  • Make a Boolean array indicating where arr is strictly greater than 1 and less than or equal to 3.

Here is a reminder of what arr looks like.

arr
array([4, 3, 2, 3, 1, 0, 4, 0, 4, 2])

Here we check where it’s strictly greater than 1.

arr > 1
array([ True,  True,  True,  True, False, False,  True, False,  True,
        True])

Here we check the other part, where arr is less than or equal to 3.

arr <= 3
array([False,  True,  True,  True,  True,  True, False,  True, False,
        True])

We now want to check where both are satisfied. Python gets confused because we do not have parentheses. I spent some time in class trying to track down exactly what causes this error, but I could never quite reproduce it. Note that we use &. Usually with NumPy and pandas we use & rather than spelling it out and.

arr > 1 & arr <= 3
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[24], line 1
----> 1 arr > 1 & arr <= 3

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Here is a reminder of what arr looks like.

arr
array([4, 3, 2, 3, 1, 0, 4, 0, 4, 2])

Here is our Boolean array.

# Boolean array
(arr > 1) & (arr <= 3)
array([False,  True,  True,  True, False, False, False, False, False,
        True])
  • Using Boolean indexing, produce the subarray of arr containing the values which are strictly greater than 1 and less than or equal to 3.

arr[(arr > 1) & (arr <= 3)]
array([3, 2, 3, 2])
  • Make a 10x3 NumPy array arr2 of random integers between 0 (inclusive) and 5 (exclusive). Here, we will specify the size keyword argument using a tuple rather than an int. Use seed=100 so we all have the same values.

rng = np.random.default_rng(seed=100)
arr2 = rng.integers(0, 5, size=(10,3))
arr2
array([[3, 4, 0],
       [2, 0, 1],
       [2, 0, 2],
       [4, 4, 2],
       [2, 3, 4],
       [4, 0, 3],
       [3, 0, 2],
       [4, 3, 1],
       [1, 3, 0],
       [2, 2, 2]])
  • Define a variable col that is equal to the 0-th column of arr2.

(I try to consistently start counting at 0 in this class, which is the Python convention. It might be more grammatically correct to say “the column at index 0”.)

col = arr2[:, 0]
col
array([3, 2, 2, 4, 2, 4, 3, 4, 1, 2])

What if we had used two sets of square brackets, like what we need if we are using lists of lists? Break the following up into pieces. arr2[:] is getting the entire array (“every row”), and then [0] is getting the top row.

arr2[:][0]
array([3, 4, 0])

Here is a reminder of what arr2 looks like.

arr2
array([[3, 4, 0],
       [2, 0, 1],
       [2, 0, 2],
       [4, 4, 2],
       [2, 3, 4],
       [4, 0, 3],
       [3, 0, 2],
       [4, 3, 1],
       [1, 3, 0],
       [2, 2, 2]])

We could also use arr2[:, 2] in this case to get the last column (because it occurs at index 2), but it is more readable to use -1, which is an abbreviation for, “last”.

# last column
arr2[:, -1]
array([0, 1, 2, 2, 4, 3, 2, 1, 0, 2])
  • Create the subarray of arr2 containing the rows which begin with a 2.

We can see what number each row starts with by using col which we defined above.

col
array([3, 2, 2, 4, 2, 4, 3, 4, 1, 2])

Here we find where col is equal to 2.

col == 2
array([False,  True,  True, False,  True, False, False, False, False,
        True])

Here again we use Boolean indexing to extract all the rows that begin with 2.

arr2[col == 2]
array([[2, 0, 1],
       [2, 0, 2],
       [2, 3, 4],
       [2, 2, 2]])

We’ll start with the following on Friday.

More complex example of Boolean indexing#

We will create the subarray of arr2 containing the rows which have at least two 2s using the following strategy.

  • Make a 10x3 Boolean array indicating where arr2 is equal to 2.

  • Use the sum method with axis=1 to find how many 2s there are in each row.

  • Use Boolean indexing to create the subarray of arr2 containing only the rows which have at least two 2s.

Time to work on Worksheet 2#

  • Hanson and I are here to help.

  • I have another class at 2pm, so if you have questions, ask me now rather than after class!