Week 1 Wednesday#
Announcements#
I’ll have office hours on Fridays 12-1pm in here (ALP 3610). I hope that will be a convenient time to come by and work before class, even if you don’t have specific questions!
I had some trouble with the Deepnote hardware resetting on Monday. I contacted Deepnote and they don’t expect that to continue. If we keep having problems, we will switch to working locally (not in the cloud) and using Jupyter notebooks directly.
The Monday file is posted in the course notes and some explanations were added.
Worksheet 2 distributed today.
Hanson (Songhan), one of our three LAs, is here to help.
If you’re stuck on something and not able to ask in person, try asking on Ed Discussion (linked from Canvas).
Boolean arrays and Boolean indexing#
import numpy as np
# Instantiate a random number generator object
rng = np.random.default_rng()
Here we use rng
to make a length 10 NumPy array of random integers between 0 (inclusive) and 5 (exclusive).
arr = rng.integers(0, 5, size=10)
arr
array([1, 1, 1, 3, 2, 3, 3, 3, 0, 4])
How can we guarantee consistent (or reproducible) random integers?
rng = np.random.default_rng(seed=452023)
If you run the following cell, immediately after using the code above (in particular, using the same seed
value), then you should see the same resulting array of random integers.
arr = rng.integers(0, 5, size=10)
arr
array([4, 3, 2, 3, 1, 0, 4, 0, 4, 2])
If we run the same code again, we will get new integers.
arr = rng.integers(0, 5, size=10)
arr
array([4, 0, 4, 0, 4, 4, 2, 0, 3, 4])
To get consistent results, it helps to put all of these lines into the same cell (using enter instead of shift+enter to create a new line). Notice that this is the same array that was produced first above using the seed=452023
keyword argument.
rng = np.random.default_rng(seed=452023)
arr = rng.integers(0, 5, size=10)
arr
array([4, 3, 2, 3, 1, 0, 4, 0, 4, 2])
Make a Boolean array indicating where the array is equal to 2.
Be sure you understand how these Boolean values correspond to the values in the array. Also, notice that we are using two equals signs, not one, to compare for elementwise equality. (One equals sign is for assignment.)
arr == 2
array([False, False, True, False, False, False, False, False, False,
True])
Count how many of these entries are equal to 2.
Because True
is treated like 1
and False
is treated like 0
, we can count the number of True
values (in this case, that is the number of 2
values in the original array) by using sum
.
Here we use the built-in Python function sum
.
sum(arr == 2)
2
Here we use the NumPy array method sum
. (Methods and attributes come after the object, and are accessed using a period .
.)
(arr == 2).sum()
2
Sometimes it is more elegant to save the intermediate values along the way, rather than copy-pasting. Here we save the Boolean array with the variable name ba
. I don’t think the parentheses are necessary in this case, but they definitely make it more readable.
ba = (arr == 2)
ba
array([False, False, True, False, False, False, False, False, False,
True])
Notice that ba
really is a NumPy array.
type(ba)
numpy.ndarray
Here is another example of how seed
works. When we do the following without specifying a seed, we get different results every time.
rng2 = np.random.default_rng()
rng2.random(4)
array([0.10394648, 0.72812454, 0.53265183, 0.77371897])
Notice how the exact same code produces new results.
rng2 = np.random.default_rng()
rng2.random(4)
array([0.73731568, 0.53727985, 0.21285365, 0.04879829])
On the other hand, when we use a fixed seed
keyword argument, we get the same output every time.
rng2 = np.random.default_rng(seed=40)
rng2.random(4)
array([0.7298985 , 0.69341496, 0.94192102, 0.05965206])
Here is that same output.
rng2 = np.random.default_rng(seed=40)
rng2.random(4)
array([0.7298985 , 0.69341496, 0.94192102, 0.05965206])
On the other hand, if we change to a different seed
, we get a new output.
rng2 = np.random.default_rng(seed=400)
rng2.random(4)
array([0.19909881, 0.61184905, 0.97727673, 0.38164342])
Make a Boolean array indicating where
arr
is strictly greater than 1 and less than or equal to 3.
Here is a reminder of what arr
looks like.
arr
array([4, 3, 2, 3, 1, 0, 4, 0, 4, 2])
Here we check where it’s strictly greater than 1
.
arr > 1
array([ True, True, True, True, False, False, True, False, True,
True])
Here we check the other part, where arr
is less than or equal to 3
.
arr <= 3
array([False, True, True, True, True, True, False, True, False,
True])
We now want to check where both are satisfied. Python gets confused because we do not have parentheses. I spent some time in class trying to track down exactly what causes this error, but I could never quite reproduce it. Note that we use &
. Usually with NumPy and pandas we use &
rather than spelling it out and
.
arr > 1 & arr <= 3
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[24], line 1
----> 1 arr > 1 & arr <= 3
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Here is a reminder of what arr
looks like.
arr
array([4, 3, 2, 3, 1, 0, 4, 0, 4, 2])
Here is our Boolean array.
# Boolean array
(arr > 1) & (arr <= 3)
array([False, True, True, True, False, False, False, False, False,
True])
Using Boolean indexing, produce the subarray of
arr
containing the values which are strictly greater than 1 and less than or equal to 3.
arr[(arr > 1) & (arr <= 3)]
array([3, 2, 3, 2])
Make a 10x3 NumPy array
arr2
of random integers between 0 (inclusive) and 5 (exclusive). Here, we will specify thesize
keyword argument using atuple
rather than anint
. Useseed=100
so we all have the same values.
rng = np.random.default_rng(seed=100)
arr2 = rng.integers(0, 5, size=(10,3))
arr2
array([[3, 4, 0],
[2, 0, 1],
[2, 0, 2],
[4, 4, 2],
[2, 3, 4],
[4, 0, 3],
[3, 0, 2],
[4, 3, 1],
[1, 3, 0],
[2, 2, 2]])
Define a variable
col
that is equal to the 0-th column ofarr2
.
(I try to consistently start counting at 0 in this class, which is the Python convention. It might be more grammatically correct to say “the column at index 0”.)
col = arr2[:, 0]
col
array([3, 2, 2, 4, 2, 4, 3, 4, 1, 2])
What if we had used two sets of square brackets, like what we need if we are using lists of lists? Break the following up into pieces. arr2[:]
is getting the entire array (“every row”), and then [0]
is getting the top row.
arr2[:][0]
array([3, 4, 0])
Here is a reminder of what arr2
looks like.
arr2
array([[3, 4, 0],
[2, 0, 1],
[2, 0, 2],
[4, 4, 2],
[2, 3, 4],
[4, 0, 3],
[3, 0, 2],
[4, 3, 1],
[1, 3, 0],
[2, 2, 2]])
We could also use arr2[:, 2]
in this case to get the last column (because it occurs at index 2
), but it is more readable to use -1
, which is an abbreviation for, “last”.
# last column
arr2[:, -1]
array([0, 1, 2, 2, 4, 3, 2, 1, 0, 2])
Create the subarray of
arr2
containing the rows which begin with a 2.
We can see what number each row starts with by using col
which we defined above.
col
array([3, 2, 2, 4, 2, 4, 3, 4, 1, 2])
Here we find where col
is equal to 2
.
col == 2
array([False, True, True, False, True, False, False, False, False,
True])
Here again we use Boolean indexing to extract all the rows that begin with 2
.
arr2[col == 2]
array([[2, 0, 1],
[2, 0, 2],
[2, 3, 4],
[2, 2, 2]])
We’ll start with the following on Friday.
More complex example of Boolean indexing#
We will create the subarray of arr2
containing the rows which have at least two 2s using the following strategy.
Make a 10x3 Boolean array indicating where
arr2
is equal to 2.
Use the
sum
method withaxis=1
to find how many 2s there are in each row.
Use Boolean indexing to create the subarray of
arr2
containing only the rows which have at least two 2s.
Time to work on Worksheet 2#
Hanson and I are here to help.
I have another class at 2pm, so if you have questions, ask me now rather than after class!