Categories

# Part 1: PRIMER ON NUMPY ARRAYS

We will learn several different methods to create a Numpy array, techniques to reshape an array, methods to extract array attributes and other basics.

• NumPy is short for Numerical Python
• NumPy object is essentially a homogeneous n-dimensional array
• Numpy stores (essentially) numbers in each dimension, all of the same type and indexed to non-negative integers
• Numpy dimension has a special name `axes` For example, array `[1,2,3,4]` is a 1-D array. It has one-axis and 3 elements
• Why NumPy is so popular in Data Science? Answers lies in the fact that almost all types of data (documents, sound clip, images, etc) is essentially represented by array of numbers
• On appearance, Numpy array might look familiar to Python built-in `list`, however, former provides much efficient data storage and operation than `list`

## 1. CREATING NUMPY ARRAY

We will discuss multiple ways to create an array. But first, let’s us `import` the numpy library into the program and as tradition, `import` it with alias `np`

``````# import numpy as np
import numpy as np
``````

### 1.1. From the Python’s List

``````# first, create a list from 'list' function
list1 = list(range(1,10))
print("This is the list")
print(list1)

# second, create array from the list
print("\nThis is the Numpy array created from above list")
print(np.array(list1))
``````
``````This is the list
[1, 2, 3, 4, 5, 6, 7, 8, 9]

This is the Numpy array created from above list
[1 2 3 4 5 6 7 8 9]
``````

#### a. Upcasting

All array elements should be of same data type (dtype). Therefore, if our provided `list` doesn’t have same data type, Numpy will upcast the data type to the next higher one. That is:

• If list contains `int` and `float` dtype, all elements will be up-casted to `float` (example 1 below)
• If list contains `int`, `float` and `str` dtype, all elements will be up-casted to `str` (example 2 below)
``````# example 1
e1 = np.array([1,2,3.0])
print(e1)
print(e1.dtype)

# example 2
e2 = np.array([1,'string',3.0])

print(e2)
print(e2.dtype)
``````
``````[1. 2. 3.]
float64
['1' 'string' '3.0']
<U21
``````

#### b. Explicitly setting the dtype

We can use the keyword argument (kwarg) `dtype` to explicitly set the data type of Numpy Array

``````e3 = np.array([1,2,3.0], dtype='int')

print(e3)
print(e3.dtype)
``````
``````[1 2 3]
int64
``````

### 1.2. From Ranged Data

We can create Numpy array from `np.arange()` function as follows:

#### a. Single number is passed in the argument

If only single number, `n` is passed in `np.arange(n)`, then array would be `[0,n-1]`, which means `0` is inclusive and `n` is exclusive

``````# integer
np.arange(10)
``````
``````array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
``````
``````# float
np.arange(10.5)
``````
``````array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])
``````

#### b. Two numbers passed in the argument

If two numbers are passed `np.arange(x,y)` then the array will contain all integers from `x` (inclusive) and `y`(exclusive)

``````# example 1
np.arange(1,6)
``````
``````array([1, 2, 3, 4, 5])
``````
``````# example 2
np.arange(-5,6)
``````
``````array([-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5])
``````

#### c. Three numbers are passed in the arguments

If three numbers `np.arange(x,y,z)` are passed, array would start from `x` (inclusive), ends with `y` (exclusive) and `z` will define the step size — number of integers to skip. The default value of step size is 1.

``````# table of 2
np.arange(2,21,2)
``````
``````array([ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20])
``````

### 1.3. From Linspace function

• `np.linspace(a,b)` function produces equally spaced elements of array between `a` and `b`
• Unlike `np.arange(a,b)`, in `np.linspace(a,b)`, both `a` and `b` are inclusive in the array, unless we pass `endpoint=False` as kwarg
• If kwarg `num` is not specified, the default number of created elements would be 50
``````# example 1
np.linspace(1,50)
``````
``````array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13.,
14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26.,
27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39.,
40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50.])
``````
``````# example 2
np.linspace(1,50, num=5)
``````
``````array([ 1.  , 13.25, 25.5 , 37.75, 50.  ])
``````

### 1.4. Special Functions

• `np.zeros(x)` will create an array of `x` zeros
``````# np.zeros
np.zeros(5)
``````
``````array([0., 0., 0., 0., 0.])
``````
• `np.ones(x)` will create an array of `x` ones
``````# np.ones
np.ones(5)
``````
``````array([1., 1., 1., 1., 1.])
``````
• `np.zeros_like(a)` will create new array of same shape as `a` but replace all values with `0`
``````# np.zeros_like
h = np.arange(5)
print(f"Before applying zeros_like function:\n{h}")
print(f"After zeros_like function:")
print(np.zeros_like(h))
``````
``````Before applying zeros_like function:
[0 1 2 3 4]
After zeros_like function:
[0 0 0 0 0]
``````
• `np.ones_like(a)` will create new array of same shape as `a` but replace all values with `1`
``````# np.zeros_like
h = np.arange(5)
print(f"Before applying ones_like function:\n{h}")
print(f"After ones_like function:")
print(np.ones_like(h))
``````
``````Before applying ones_like function:
[0 1 2 3 4]
After ones_like function:
[1 1 1 1 1]
``````
• `np.full((a,b), x)` will create an array of `a`*`b` dimension containing only one number,`x`
``````# np.full
np.full((1,4), 10)
``````
``````array([[10, 10, 10, 10]])
``````
• `np.eye(x,y)` will create identity matrixnp.random.random
• of `x` * `y` dimension
``````# np.eye
np.eye(3,3)
``````
``````array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
``````

### 1.5. From Random Numbers

There are several ways to create Numpy Array from random numbers

#### a. np.random.random

`np.random.random` outputs an array of random floats of required size and shape between `0.0` and `1.0`

``````# 1d, 5 random numbers array
np.random.random(5)
``````
``````array([0.90916555, 0.31457971, 0.9870763 , 0.85703146, 0.32406204])
``````
``````# 2d, 6 random numbers array
np.random.random((2,3))
``````
``````array([[0.1305562 , 0.90709993, 0.04794402],
[0.03463245, 0.42490532, 0.15678277]])
``````

#### b. np.random.randint

`np.random.randint` outputs an array of random integers of required size, shape and interval

``````# example 1
# 1D array of 5 random integers between 0 and 100(exclusive)
np.random.randint(100, size=5)
``````
``````array([91, 15, 90, 84, 14])
``````
``````# example 2
# 1D array of 5 random integers between 50 and 100(exclusive)
np.random.randint(50,100, size=5)
``````
``````array([68, 50, 77, 58, 85])
``````
``````# example 2
# 2D array of size 6, shape (2,3) between 1 and 100
np.random.randint(1,100, size=(2,3))
``````
``````array([[67, 89, 78],
[65, 59, 82]])
``````

Setting the `RandomState` for reproducibility of the results

``````# we can set random state, for reproducibility
# it means, every time the code is run, same output is retured
rand = np.random.RandomState(50)

rand.randint(100, size=5)
``````
``````array([48, 96, 11, 33, 94])
``````

#### c. np.random.normal

`np.random.normal` allows us to produce an array of random numbers whose mean and standard deviation dictates a normal distribution

``````# creating array of shape (2*3)
# 'normally distributed random value'
# with mean 0 and standard deviation of 1
np.random.normal(0,1,(2,3))
``````
``````array([[ 0.50919996,  1.58425045,  1.89129778],
[ 1.25874576, -0.26439083, -0.36133986]])
``````

## 2. RESHAPING THE ARRAY

In section 1 above, we have discussed mutiple ways to create an array. Some of these methods only create a 1D array, however, we can change dimensions to any shape using `np.reshape()`function

### 2.1. np.reshape

We have used `np.arange()` to create a 1D array. We can apply `.reshape()` to this 1D to reshape it to any shape we want. However, we need to be careful that total elements in 1D and reshaped array are equal. For example, we have 1D array of 10 elements, we can reshape it to (2,5) or (5,2) only:

``````np.arange(10).reshape(2,5)
``````
``````array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
``````
``````np.arange(10).reshape(5,2)
``````
``````array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
``````

### 2.2. np.reshape with -1

In the above examples, we knew the number of elements in the original 1D array, and we explicitly provided new shape in `np.reshape()` However, there is special value `-1` which can be used to infer the shape of the array. It infers the value of dimension, where it is provided in either `(x,-1)` or `(-1,y)` Examples, will make the concept clearer:

``````# example 1 (x,-1)
np.arange(10).reshape(2,-1)
``````
``````array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
``````
``````# example 2 (-1,y)
np.arange(10).reshape(-1,2)
``````
``````array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
``````
``````# how to apply reshape on array saved as variable
u = np.arange(10)
np.reshape(u, (2,5))
``````
``````array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
``````

### 2.3. flatten method

We can reshape any nD array into 1D using `.flatten` method

``````ar = np.arange(20).reshape(2,10)
print("Here is original (2,10) array")
print(ar)

print("\nHere is the flattened version")
print(ar.flatten())
``````
``````Here is original (2,10) array
[[ 0  1  2  3  4  5  6  7  8  9]
[10 11 12 13 14 15 16 17 18 19]]

Here is the flattened version
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
``````

### 2.4. Transpose

We all know what transpose feature does in Excel, here we will achieve the same result using `np.transpose()` function

#### a. Simple Transpose

In simple transpose, we are changing rows into column, and vice versa.

``````# making a (2,5) array
ar1 = np.arange(10).reshape(2,5)
print("This is original (2,5) array")
print(ar1)

print("\nThis is transposed version, into (5,2)")
print(np.transpose(ar1))
``````
``````This is original (2,5) array
[[0 1 2 3 4]
[5 6 7 8 9]]

This is transposed version, into (5,2)
[[0 5]
[1 6]
[2 7]
[3 8]
[4 9]]
``````

#### b. Transpose with axes kwarg

Keyword argument `axes` takes in a list/tuples of integers and permute the axes according to the given values. For example, if original axes are (3,4,2) and we provide `axes=(1,2,0)`, this will make:

• the original first dimension, new third dimension,
• original second dimension, new first dimension, and,
• original third dimension, new second dimension.

Actually, the default value of `axes` kwarg in transpose function is `(2,1,0)` which essentially reversing the order.

``````# creating array to tranpose using 'axes; kwarg
arr2 = np.arange(24).reshape(3,4,2)
print(f"Shape of original array: {np.shape(arr2)}")

# applying transpose with axes=(1,2,0)
trr2 = np.transpose(arr2, axes=(1,2,0))
print(f"\nTranposed Shape: {np.shape(trr2)}")
``````
``````Shape of original array: (3, 4, 2)

Tranposed Shape: (4, 2, 3)
``````

## 3. ARRAY ATTRIBUTES

There are various methods to fetch variety of attributes of a NumPy array. Few, we have already demonstrated, others are discussed in this section

``````# creating an array, whose attributes will be discussed
d = np.arange(24).reshape(2,3,4)
print(d)
``````
``````[[[ 0  1  2  3]
[ 4  5  6  7]
[ 8  9 10 11]]

[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
``````
• `.ndim` to get the number of dimensions
``````d.ndim
``````
``````3
``````
• `.shape` to get the actual size of each dimensions of an array, returned in the form of tuple
``````d.shape
``````
``````(2, 3, 4)
``````
• `.dtype` to get the data type associated with the array
``````d.dtype
``````
``````dtype('int64')
``````
• `.size` to get total size of an array, which is essentially total no of elements in array
``````d.size
``````
``````24
``````
• `.itemsize` to get the size of each array element in bytes
``````d.itemsize
``````
``````8
``````
• `.nbytes` to get total size of the array, essentially equal to `size * itemsize`
``````d.nbytes
``````
``````192
``````

## 4. OTHER BASICS

• To copy an array, we use `np.copy(array_to_copy)` method. This is very important, if we don’t want to modify the original array but need a copy for some other tasks
``````print("Shape of orginal array:")
print(np.shape(d))

# making copy of d and reshape it
e = np.copy(d)
e = e.reshape(6,4)

print("Shape of copied and reshaped array:")
print(np.shape(e))

print("Shape of orginal array is still the same:")
print(np.shape(d))
``````
``````Shape of orginal array:
(2, 3, 4)
Shape of copied and reshaped array:
(6, 4)
Shape of orginal array is still the same:
(2, 3, 4)
``````
• To cast the array into a specific data type, we use `.astype` method
``````# original dtype of array, d
print(d.dtype)

# casting the dtype to float
d = d.astype(float)
print(d.dtype)
``````
``````int64
float64
``````
• There are two special number types, `np.nan` and `np.inf` whose presence in an array will cast the whole array as `float` data type
``````# np.nan in an array
np.array([1,2,np.nan]).dtype
``````
``````dtype('float64')
``````
``````# np.inf in an array
np.array([1,2,np.inf]).dtype
``````
``````dtype('float64')
``````