Boolean arrays in NumPy

Boolean arrays in NumPy

A Boolean array by itself is not very interesting; it’s just a NumPy array whose entries are either True or False.

import numpy as np
bool_arr = np.array([True,True,False,True])
bool_arr
array([ True,  True, False,  True])

The reason Boolean arrays are important is that they are often produced by other operations.

arr = np.array([3,1,4,1])
arr < 3.5
array([ True,  True, False,  True])

The number of Trues in a Boolean array can be counted very efficiently using np.count_nonzero. Reminders:

  • s means seconds;

  • ms means milliseconds, \(10^{-3}\);

  • µs means microseconds, \(10^{-6}\);

  • ns means nanoseconds, \(10^{-9}\).

From a small example, it might seem like the NumPy method is slower:

my_list = [3,1,4,3,5]
my_array = np.array(my_list)
my_list.count(3)
2
%%timeit
my_list.count(3)
75.9 ns ± 0.305 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
np.count_nonzero(my_array==3)
2
%%timeit
np.count_nonzero(my_array==3)
1.55 µs ± 7.77 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

But for a longer example, it will be clear that the NumPy method is faster. In this example, our array and list have length ten million.

rng = np.random.default_rng()
my_array = rng.integers(1,6,size=10**7)
my_list = list(my_array)
my_list.count(3)
2000793
np.count_nonzero(my_array==3)
2000793
%%timeit
my_list.count(3)
985 ms ± 5.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
np.count_nonzero(my_array==3)
3.04 ms ± 9.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)