# Boolean arrays in NumPy

A Boolean array by itself is not very interesting; it's just a NumPy array whose entries are either `True` or `False`.

In [2]:
import numpy as np

In [4]:
bool_arr = np.array([True,True,False,True])
bool_arr

array([ True,  True, False,  True])

The reason Boolean arrays are important is that they are often produced by other operations.

In [6]:
arr = np.array([3,1,4,1])
arr < 3.5

array([ True,  True, False,  True])

The number of `True`s in a Boolean array can be counted very efficiently using `np.count_nonzero`.  Reminders:
* s means seconds;
* ms means milliseconds, $10^{-3}$;
* µs means microseconds, $10^{-6}$;
* ns means nanoseconds, $10^{-9}$.

From a small example, it might seem like the NumPy method is slower:

In [8]:
my_list = [3,1,4,3,5]
my_array = np.array(my_list)

In [10]:
my_list.count(3)

2

In [9]:
%%timeit
my_list.count(3)

75 ns ± 0.149 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [11]:
np.count_nonzero(my_array==3)

2

In [12]:
%%timeit
np.count_nonzero(my_array==3)

1.62 µs ± 5.13 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


But for a longer example, it will be clear that the NumPy method is faster.  In this example, our array and list have length ten million.

In [14]:
rng = np.random.default_rng()
my_array = rng.integers(1,6,size=10**7)
my_list = list(my_array)

In [15]:
my_list.count(3)

2001713

In [16]:
np.count_nonzero(my_array==3)

2001713

In [17]:
%%timeit
my_list.count(3)

965 ms ± 5.29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [18]:
%%timeit
np.count_nonzero(my_array==3)

2.97 ms ± 4.59 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
