Numpy for Data Science

Aashina Arora
2y
1.9k
0
3

100

Article

Introduction

In this article, we will learn about NumPy, which can help you in Data Analysis.

If you are a beginner and do not have any idea about NumPy, please refer to my article- Getting Started with Numpy
Also, a high-level overview has been given on some of the operations done in NumPy in my article- Numpy- Array

Here, we will learn how NumPy is useful when it comes to Data Analysis.

About NumPY

It is a Linear Algebra Library.
It has binding to C libraries and is mostly used for Scientific Computing.

Before getting started, you need to install NumPy:

If you are using pip, then for the installation process, refer to my previous article
If you are using Anaconda Distribution, then simply write “conda install numpy” on the shell.

Note. I am using Anaconda Distribution (Jupyter Notebook)

NumPy arrays are of 2 types,

Vectors (1-D Array)
Matrices (2-D Array)

Creating 1-D Array

1. Using array() method

import numpy as np
lst=[1,2,3]
np.array(lst)

Output

Output

Note

In Jupyter Notebook, even if you don't write print(), with shift+ENTER, you will get the output; hence you would no see print() in my code snippet. But you can write if it helps.
If you are using only one notebook, then you don't have to write "import" every time. But if you change or create a new notebook that is a new session, then you need to import numpy again.

2. Using arange() method

import numpy as np
np.arange(0,10)

This means the array will start printing from 0 and ends at 10-1 i.e., 9

Output

Output

You can also use arange() method by inserting an increment or step as a third argument.

Example

np.arange(0,11,2)

This means the array will start printing from 0 and ends at 10 with a step/increment of 2.

Output

Output

3. Using random()

import numpy as np
np.random.rand(5)

Output

Output

If you want the numbers inside the array from Standard Normal or Gaussian Distribution. Then, use randn()

np.random.randn(5)

Output

Output

rand() will give a floating-point number; if you want integer values, then use randint()

np.random.randint(5,20)

This means a random number will be generated within the range – of 5 to 19 (20 exclusives).

Output

Output

If you want more numbers in the array, then

np.random.randint(5,20,6)

This will generate an array of 6 random numbers within the range – 5 to 19 (20 exclusives).

Output

Output

Creating the specific types of 1-D Arrays

1. Array of Zeros

import numpy as np
np.zeros(6)

Output

2. Array of Ones

import numpy as np
np.ones(5)

Output

Output

3. Linspace

This will generate an evenly spaced array within a given range.

import numpy as np
np.linspace(0,5,10)

This means an array will be created with 10 evenly spaces points between 0 and 5.

Output

Output

Creating 2-D Array

1. Using a list

import numpy as np
my_list=[[10,20,30], [40,50,60], [70,80,90]]
np.array(my_list)

Output

Output

2. Array of Zeros

import numpy as np
np.zeros((6,6))

The first parameter represents no. of rows, and the second parameter is no. of columns. So, according to the above example, the array will have 6 rows and 6 columns.

Output

Output

3. Array of Ones

import numpy as np
np.ones((4,5))

The above array’s shape is 4*5 i.e., it will have 4 rows and 5 columns.

Output

Output

4. Identity Matrix

It is a matrix of zeros except for the diagonal; the diagonal will have 1s.

import numpy as np
np.eye(5,5)

Output

Output

5. Using Random

import numpy as np
np.random.rand(5,5)

Output

Output

If you want the numbers inside the array from Standard Normal or Gaussian Distribution. Then, use randn()

np.random.randn(5,5)

Output

Now let’s discuss some more operations that can be performed using NumPy.

RESHAPE AN ARRAY

import numpy as np
new_array = np. arange(25)
new_array.reshape(5,5)

Note that reshape parameters must fit into the shape of a 1-D Array. Meaning the size of the 1-D array in the above example is 25. So, if we want to create a 2-D array from it, then we can either use 5X5, 3X5, or 5X3. So, a product of rows and columns must yield 25.

Fun Fact: The above code can be done in one line:

np. arange(25).reshape(5,5)

Output

Output

Now let’s explore some more in-built methods in NumPy Package.

import numpy as np

new_array = np. arange(25)
new_array.max() #This will give the maximum element
new_array.min() #This will give the minimum element
new_array.argmax() #This will give the index of maximum element
new_array.argmin() #This will give the index of minimum element
new_array.shape #This will describe the shape of the array
new_array.dtype #This will describe the type of the array
new_array.copy() #This will create a copy or duplicate of the array

Output

Output

Note. It seems like max() and argmax() are the same. Same scenario with min() and argmin(). But that is not true.

Let’s see another example with a 2-D array, which will clarify the difference between max() and argmax()

another_array = np.arange(12).reshape(4,3)
np.argmax(another_array)
np.argmax(another_array,axis=0)
np.argmax(another_array,axis=1)

Output

Universal Array Functions

np.sqrt(new_array)#Evaluates the square root of each element of the array
np.exp(new_array) #Evaluates the exponent of each element of the array
np.sin(new_array) #Evaluates the sine value of each element of the array
np.log(new_array) #Evaluates the log value of each element of the array

Output

Output

Indexing and Selection

1. Slicing

Slicing will help you return a specified part of the array.

import numpy as np
slicing_arr = np.arange(0,10)

slicing_arr[0:5] #This will return the elements of array starting with index 0 and ending at index 5-1 i.e.,4
slicing_arr[5:] #This will return the elements of array starting with index 5 till the end of the array
slicing_arr[:6] #This will return the elements of array starting from the beginning i.e., 0 index and ending at 5th index.

Output

Output

2. Broadcast

import numpy as np
b_arr = np.arange(0,10)

b_arr[0:6]=100  #All the elements of array from 0 to 5 index will be assigned a value 100 all at once.
b_arr[:]=99  #All the elements of array will be assigned a value 99 all at once.

Output

Output

3. Access element in 2-D Array

import numpy as np
my_list=[[10,20,30], [40,50,60], [70,80,90]]
arr=np.array(my_list)

arr[0][0]  #accessing first element, 1st row and 1st column
arr[2][1]  #accessing element at 2nd row and 1st column
arr[3][2]  #Gives Error! Because no such element exists, we 3 rows (index-0,1,2) and 3 columns only (index-0,1,2). So, by [3][2], we mean 4th row and 2nd column which does not exist, hence the error.

Output

4. Getting Sub-Array

To get a sub-array from an existing array, you can use slicing in various manners, for example-

arr[:2,1:] #returns the top-right corner of the actual array as our sub-array

Output

Output

Try Now! - Try fetching the upper left corner matrix

Using Operators with Array

1. Comparison

You can also do various comparisons within arrays using comparison operators (>,<,>=,<=,!=), for example-

c_arr = np.arange(1,11)

c_arr>5  #returns true , if elements in array are greater than 5, else returns false
c_arr[c_arr>5]  #returns the elements which are greater than 5

Output

Output

2. Arithmetic

You can also do various arithmetic operations within arrays using arithmetic operators (+,-,*,/,%), for example-

c_arr+ c_arr  #adding with same array
c_arr- c_arr  #subtracting with same array
c_arr* c_arr  #multiplying with same array
c_arr+ 100  #adding an integer to an array
c_arr/100  #dividing the array with an integer

Output

Output

Conclusion

Numpy is capable of performing a wide range of operations and activities. It is also used to clean and analyze data. However, for a beginner, learning these many processes will suffice to begin your career or profession in the subject of Data Science. Numpy is merely a building piece that will aid you as you progress through additional data science themes and begin to use more Python libraries such as Pandas, Matplotlib, Scipy, and so on. We'll learn more about numpy and other Python modules in my future series of posts.
Practice Hard. Keep learning...

Thanks for reading!