Introduction
In this article, we will learn about NumPy, which can help you in Data Analysis.
- If you are a beginner and do not have any idea about NumPy, please refer to my article- Getting Started with Numpy
- Also, a high-level overview has been given on some of the operations done in NumPy in my article- Numpy- Array
Here, we will learn how NumPy is useful when it comes to Data Analysis.
About NumPY
- It is a Linear Algebra Library.
- It has binding to C libraries and is mostly used for Scientific Computing.
Before getting started, you need to install NumPy:
- If you are using pip, then for the installation process, refer to my previous article
- If you are using Anaconda Distribution, then simply write “conda install numpy” on the shell.
Note. I am using Anaconda Distribution (Jupyter Notebook)
NumPy arrays are of 2 types,
- Vectors (1-D Array)
- Matrices (2-D Array)
Creating 1-D Array
1. Using array() method
Output
![]()
![Output]()
Note
- In Jupyter Notebook, even if you don't write print(), with shift+ENTER, you will get the output; hence you would no see print() in my code snippet. But you can write if it helps.
- If you are using only one notebook, then you don't have to write "import" every time. But if you change or create a new notebook that is a new session, then you need to import numpy again.
2. Using arange() method
This means the array will start printing from 0 and ends at 10-1 i.e., 9
Output
![Output]()
You can also use arange() method by inserting an increment or step as a third argument.
Example
This means the array will start printing from 0 and ends at 10 with a step/increment of 2.
Output
![Output]()
3. Using random()
Output
![Output]()
If you want the numbers inside the array from Standard Normal or Gaussian Distribution. Then, use randn()
Output
![Output]()
rand() will give a floating-point number; if you want integer values, then use randint()
This means a random number will be generated within the range – of 5 to 19 (20 exclusives).
Output
![Output]()
If you want more numbers in the array, then
This will generate an array of 6 random numbers within the range – 5 to 19 (20 exclusives).
Output
![Output]()
Creating the specific types of 1-D Arrays
1. Array of Zeros
Output
![Output]()
2. Array of Ones
Output
![Output]()
3. Linspace
This will generate an evenly spaced array within a given range.
This means an array will be created with 10 evenly spaces points between 0 and 5.
Output
![Output]()
Creating 2-D Array
1. Using a list
Output
![Output]()
2. Array of Zeros
The first parameter represents no. of rows, and the second parameter is no. of columns. So, according to the above example, the array will have 6 rows and 6 columns.
Output
![Output]()
3. Array of Ones
The above array’s shape is 4*5 i.e., it will have 4 rows and 5 columns.
Output
![Output]()
4. Identity Matrix
It is a matrix of zeros except for the diagonal; the diagonal will have 1s.
Output
![Output]()
5. Using Random
Output
![Output]()
If you want the numbers inside the array from Standard Normal or Gaussian Distribution. Then, use randn()
Output
![Output]()
Now let’s discuss some more operations that can be performed using NumPy.
RESHAPE AN ARRAY
Note that reshape parameters must fit into the shape of a 1-D Array. Meaning the size of the 1-D array in the above example is 25. So, if we want to create a 2-D array from it, then we can either use 5X5, 3X5, or 5X3. So, a product of rows and columns must yield 25.
Fun Fact: The above code can be done in one line:
Output
![Output]()
Now let’s explore some more in-built methods in NumPy Package.
Output
![Output]()
Note. It seems like max() and argmax() are the same. Same scenario with min() and argmin(). But that is not true.
Let’s see another example with a 2-D array, which will clarify the difference between max() and argmax()
![Output]()
Universal Array Functions
Output
![Output]()
Indexing and Selection
1. Slicing
Slicing will help you return a specified part of the array.
Output
![Output]()
2. Broadcast
Output
![Output]()
3. Access element in 2-D Array
Output
![]()
4. Getting Sub-Array
To get a sub-array from an existing array, you can use slicing in various manners, for example-
Output
![Output]()
Try Now! - Try fetching the upper left corner matrix
Using Operators with Array
1. Comparison
You can also do various comparisons within arrays using comparison operators (>,<,>=,<=,!=), for example-
Output
![Output]()
2. Arithmetic
You can also do various arithmetic operations within arrays using arithmetic operators (+,-,*,/,%), for example-
Output
![Output]()
Conclusion
Numpy is capable of performing a wide range of operations and activities. It is also used to clean and analyze data. However, for a beginner, learning these many processes will suffice to begin your career or profession in the subject of Data Science. Numpy is merely a building piece that will aid you as you progress through additional data science themes and begin to use more Python libraries such as Pandas, Matplotlib, Scipy, and so on. We'll learn more about numpy and other Python modules in my future series of posts.
Practice Hard. Keep learning...
Thanks for reading!