Categories

# Part 7: BOOLEAN MASKING IN NUMPY

## 1. COMPARISON OPERATOR

We will learn how to apply comparison operators (`<`, `>`, `<=`, `>=`, `==` & `!-`) on the NumPy array which returns a boolean array with `True` for all elements who fulfill the comparison operator and `False` for those who doesn’t.

``````import numpy as np

# making an array of random integers from 0 to 1000
# array shape is (5,5)
rand = np.random.RandomState(42)

arr = rand.randint(1000, size=(5,5))
print(arr)
# which elements value is greater than 500
print(arr > 500)
# which elements value is less than 750
print(arr < 750)
``````
``````[[102 435 860 270 106]
[ 71 700  20 614 121]
[466 214 330 458  87]
[372  99 871 663 130]
[661 308 769 343 491]]
[[False False  True False False]
[False  True False  True False]
[False False False False False]
[False False  True  True False]
[ True False  True False False]]
[[ True  True False  True  True]
[ True  True  True  True  True]
[ True  True  True  True  True]
[ True  True False  True  True]
[ True  True False  True  True]]
``````

### 1.1. ufunc

There are equivalent ufunc for comparison operators as listed in the table below:

``````# which elements value is greater than 500
print(np.greater(arr, 500))
# which elements value is less than 750
print(np.less(arr, 750))
``````
``````[[False False  True False False]
[False  True False  True False]
[False False False False False]
[False False  True  True False]
[ True False  True False False]]
[[ True  True False  True  True]
[ True  True  True  True  True]
[ True  True  True  True  True]
[ True  True False  True  True]
[ True  True False  True  True]]
``````

### 1.2. Working with Boolean Array

In this section, we will study some useful functions/methods to work with boolean arrays we have created by applying comparison operator on numpy array

#### a. Counting ‘True’

You must be thinking that how to count total number of `True` elements that passes the condition. There is a useful function for doing exactly the same, `no.count_nonzero()`

``````# counting the number of elements in array whose value > 500
print(np.count_nonzero(arr > 500))
# counting the number of elements in array whose value < 750
print(np.count_nonzero(arr < 750))
``````
``````7
22
``````

#### b. Alternative way to Count

We can also use `np.sum` to count the elements that passes the condition. One major benefit of using this function is that we can provide kwarg `axis` and can do the summation along preferred index

``````# total in an array
print(np.sum(arr < 750))

# along axis=0
print(np.sum(arr < 750, axis=0))
# along axis=1
print(np.sum(arr < 750, axis=1))
``````
``````22
[5 5 2 5 5]
[4 5 5 4 4]
``````

#### c. np.any and np.all

• `np.any` returns `True`, if any element in the array makes the condition pass. Otherwise returns `False`
• `np.all` returns `True`, if all elements in the array makes the condition pass. Otherwise returns `False`
• We can also provide optional kwarg `axis` to apply function along preferred axis
``````# np.any
print(np.any(arr>500))
# np.all
print(np.all(arr>10))
# np.all along axis=1
print(np.all(arr>100, axis=1))
``````
``````True
True
[ True False False False  True]
``````

### 1.3. Boolean Operators

Until now, we only applied a single comparison operator on an array. However, we can use Pythons bitwise logic operators (`&`, `|`, `^` and `~`) to apply more than one comparison operators.

For example, let suppose, for our array `arr`, we are interested to count number of elements that are greater than 500 but less than 750:

``````# using boolean operator '&'
np.count_nonzero((arr >500) & (arr < 750))
``````
``````4
``````
``````# using boolean operator '|' or
np.count_nonzero((arr < 500) | (arr >= 500))
``````
``````25
``````
``````# using ~ before a condition revert the condition
# calculate no of elememnts NOT greater than 100 AND
# should be greater than 50
arr[(~(arr >100) & (arr > 50))]
``````
``````array([71, 87, 99])
``````

#### a. ufunc

There are `ufunc` equivalent for all these boolean operators:

#### b. `&` vs `and``|` vs `or`

What is the difference between the keyword `and` and `or` and boolean operators `&` and `|`?

Keywords `and` and `or` measure the `True` or `False` status of an entire object, while `&` and `|` refer to bits within each object

``````# using '&'
print(np.count_nonzero((arr >500) & (arr < 750)))

# using 'and'
try:
(arr >500) and (arr < 750)
except Exception as e:
print(e)
``````
``````4
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
``````

In the above section, we applied single or multiple conditional operators, which returns a boolean array with `True` for element(s) that passes the condition(s) and `False` for those element(s) that don’t pass the condition(s)

In this section, we will apply this boolean array to return the actual values from the array. This process is called boolean masking

First example we covered in this section is by passing condition `arr > 500` to get the boolean array of elements passing `True` and not passing `False` this condition. Now, lets apply this condition under `[]` to return the actual values from the array, `arr`

``````# return array of elements with value < 500
arr[arr < 500]
``````
``````array([102, 435, 270, 106,  71,  20, 121, 466, 214, 330, 458,  87, 372,
99, 130, 308, 343, 491])
``````
``````# we can also use the ufunc
arr[np.less(arr,500)]
``````
``````array([102, 435, 270, 106,  71,  20, 121, 466, 214, 330, 458,  87, 372,
99, 130, 308, 343, 491])
``````
``````# passing more than one conditions
# using boolean operator
arr[(arr >500) & (arr < 750)]
``````
``````array([700, 614, 663, 661])
``````