Introduction
In this article, we will learn about NumPy, which can help you in Data Analysis.
- If you are a beginner and do not have any idea about NumPy, please refer to my article- Getting Started with Numpy
- Also, a high-level overview has been given on some of the operations done in NumPy in my article- Numpy- Array
Here, we will learn how NumPy is useful when it comes to Data Analysis.
About NumPY
- It is a Linear Algebra Library.
- It has binding to C libraries and is mostly used for Scientific Computing.
Before getting started, you need to install NumPy:
- If you are using pip, then for the installation process, refer to my previous article
- If you are using Anaconda Distribution, then simply write “conda install numpy” on the shell.
Note. I am using Anaconda Distribution (Jupyter Notebook)
NumPy arrays are of 2 types,
- Vectors (1-D Array)
- Matrices (2-D Array)
Creating 1-D Array
1. Using array() method
import numpy as np
lst=[1,2,3]
np.array(lst)
Output
Note
- In Jupyter Notebook, even if you don't write print(), with shift+ENTER, you will get the output; hence you would no see print() in my code snippet. But you can write if it helps.
- If you are using only one notebook, then you don't have to write "import" every time. But if you change or create a new notebook that is a new session, then you need to import numpy again.
2. Using arange() method
import numpy as np
np.arange(0,10)
This means the array will start printing from 0 and ends at 10-1 i.e., 9
Output
You can also use arange() method by inserting an increment or step as a third argument.
Example
np.arange(0,11,2)
This means the array will start printing from 0 and ends at 10 with a step/increment of 2.
Output
3. Using random()
import numpy as np
np.random.rand(5)
Output
If you want the numbers inside the array from Standard Normal or Gaussian Distribution. Then, use randn()
np.random.randn(5)
Output
rand() will give a floating-point number; if you want integer values, then use randint()
np.random.randint(5,20)
This means a random number will be generated within the range – of 5 to 19 (20 exclusives).
Output
If you want more numbers in the array, then
np.random.randint(5,20,6)
This will generate an array of 6 random numbers within the range – 5 to 19 (20 exclusives).
Output
Creating the specific types of 1-D Arrays
1. Array of Zeros
import numpy as np
np.zeros(6)
Output
2. Array of Ones
import numpy as np
np.ones(5)
Output
3. Linspace
This will generate an evenly spaced array within a given range.
import numpy as np
np.linspace(0,5,10)
This means an array will be created with 10 evenly spaces points between 0 and 5.
Output
Creating 2-D Array
1. Using a list
import numpy as np
my_list=[[10,20,30], [40,50,60], [70,80,90]]
np.array(my_list)
Output
2. Array of Zeros
import numpy as np
np.zeros((6,6))
The first parameter represents no. of rows, and the second parameter is no. of columns. So, according to the above example, the array will have 6 rows and 6 columns.
Output
3. Array of Ones
import numpy as np
np.ones((4,5))
The above array’s shape is 4*5 i.e., it will have 4 rows and 5 columns.
Output
4. Identity Matrix
It is a matrix of zeros except for the diagonal; the diagonal will have 1s.
import numpy as np
np.eye(5,5)
Output
5. Using Random
import numpy as np
np.random.rand(5,5)
Output
If you want the numbers inside the array from Standard Normal or Gaussian Distribution. Then, use randn()
np.random.randn(5,5)
Output
Now let’s discuss some more operations that can be performed using NumPy.
RESHAPE AN ARRAY
import numpy as np
new_array = np. arange(25)
new_array.reshape(5,5)
Note that reshape parameters must fit into the shape of a 1-D Array. Meaning the size of the 1-D array in the above example is 25. So, if we want to create a 2-D array from it, then we can either use 5X5, 3X5, or 5X3. So, a product of rows and columns must yield 25.
Fun Fact: The above code can be done in one line:
np. arange(25).reshape(5,5)
Output
Now let’s explore some more in-built methods in NumPy Package.
import numpy as np
new_array = np. arange(25)
new_array.max() #This will give the maximum element
new_array.min() #This will give the minimum element
new_array.argmax() #This will give the index of maximum element
new_array.argmin() #This will give the index of minimum element
new_array.shape #This will describe the shape of the array
new_array.dtype #This will describe the type of the array
new_array.copy() #This will create a copy or duplicate of the array
Output
Note. It seems like max() and argmax() are the same. Same scenario with min() and argmin(). But that is not true.
Let’s see another example with a 2-D array, which will clarify the difference between max() and argmax()
another_array = np.arange(12).reshape(4,3)
np.argmax(another_array)
np.argmax(another_array,axis=0)
np.argmax(another_array,axis=1)
Universal Array Functions
np.sqrt(new_array)#Evaluates the square root of each element of the array
np.exp(new_array) #Evaluates the exponent of each element of the array
np.sin(new_array) #Evaluates the sine value of each element of the array
np.log(new_array) #Evaluates the log value of each element of the array
Output
Indexing and Selection
1. Slicing
Slicing will help you return a specified part of the array.
import numpy as np
slicing_arr = np.arange(0,10)
slicing_arr[0:5] #This will return the elements of array starting with index 0 and ending at index 5-1 i.e.,4
slicing_arr[5:] #This will return the elements of array starting with index 5 till the end of the array
slicing_arr[:6] #This will return the elements of array starting from the beginning i.e., 0 index and ending at 5th index.
Output
2. Broadcast
import numpy as np
b_arr = np.arange(0,10)
b_arr[0:6]=100 #All the elements of array from 0 to 5 index will be assigned a value 100 all at once.
b_arr[:]=99 #All the elements of array will be assigned a value 99 all at once.
Output
3. Access element in 2-D Array
import numpy as np
my_list=[[10,20,30], [40,50,60], [70,80,90]]
arr=np.array(my_list)
arr[0][0] #accessing first element, 1st row and 1st column
arr[2][1] #accessing element at 2nd row and 1st column
arr[3][2] #Gives Error! Because no such element exists, we 3 rows (index-0,1,2) and 3 columns only (index-0,1,2). So, by [3][2], we mean 4th row and 2nd column which does not exist, hence the error.
Output
4. Getting Sub-Array
To get a sub-array from an existing array, you can use slicing in various manners, for example-
arr[:2,1:] #returns the top-right corner of the actual array as our sub-array
Output
Try Now! - Try fetching the upper left corner matrix
Using Operators with Array
1. Comparison
You can also do various comparisons within arrays using comparison operators (>,<,>=,<=,!=), for example-
c_arr = np.arange(1,11)
c_arr>5 #returns true , if elements in array are greater than 5, else returns false
c_arr[c_arr>5] #returns the elements which are greater than 5
Output
2. Arithmetic
You can also do various arithmetic operations within arrays using arithmetic operators (+,-,*,/,%), for example-
c_arr+ c_arr #adding with same array
c_arr- c_arr #subtracting with same array
c_arr* c_arr #multiplying with same array
c_arr+ 100 #adding an integer to an array
c_arr/100 #dividing the array with an integer
Output
Conclusion
Numpy is capable of performing a wide range of operations and activities. It is also used to clean and analyze data. However, for a beginner, learning these many processes will suffice to begin your career or profession in the subject of Data Science. Numpy is merely a building piece that will aid you as you progress through additional data science themes and begin to use more Python libraries such as Pandas, Matplotlib, Scipy, and so on. We'll learn more about numpy and other Python modules in my future series of posts.
Practice Hard. Keep learning...
Thanks for reading!