Boolean arrays in NumPy
Boolean arrays in NumPy¶
A Boolean array by itself is not very interesting; it’s just a NumPy array whose entries are either True
or False
.
import numpy as np
bool_arr = np.array([True,True,False,True])
bool_arr
array([ True, True, False, True])
The reason Boolean arrays are important is that they are often produced by other operations.
arr = np.array([3,1,4,1])
arr < 3.5
array([ True, True, False, True])
The number of True
s in a Boolean array can be counted very efficiently using np.count_nonzero
. Reminders:
s means seconds;
ms means milliseconds, \(10^{-3}\);
µs means microseconds, \(10^{-6}\);
ns means nanoseconds, \(10^{-9}\).
From a small example, it might seem like the NumPy method is slower:
my_list = [3,1,4,3,5]
my_array = np.array(my_list)
my_list.count(3)
2
%%timeit
my_list.count(3)
75.9 ns ± 0.305 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
np.count_nonzero(my_array==3)
2
%%timeit
np.count_nonzero(my_array==3)
1.55 µs ± 7.77 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
But for a longer example, it will be clear that the NumPy method is faster. In this example, our array and list have length ten million.
rng = np.random.default_rng()
my_array = rng.integers(1,6,size=10**7)
my_list = list(my_array)
my_list.count(3)
2000793
np.count_nonzero(my_array==3)
2000793
%%timeit
my_list.count(3)
985 ms ± 5.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
np.count_nonzero(my_array==3)
3.04 ms ± 9.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)