Introduction
Python has become one of the most popular dynamic programming languages, along with Ruby, Perl, etc. Python is very good for data analysis, scientific calculations, and data visualization. It is an excellent language for building data-centric applications. Python has very good libraries like NumPy, Pandas, Matplotlib, etc.
Pandas provide very good data structures and function designs. It is very fast and easy. I will explain how to create DataFrame and handling Data with DataFrame by using the Pandas library. This DataFrame with Pandas is very good for data analysis.
Installation Required
- Python 3.5 must be installed.
- Pip3 must be installed:
Go to command prompt → Type Command → pip3 install --upgrade pip
- Install pandas
Go to command prompt → Type Command → pip install pandas
- Install Jupyter notebook(It will help you to write and execute python and pandas codes by connecting to the terminal):
Go to command prompt → Type Command → pip3 install jupyter
- Open Jupyter notebook:
Go to command prompt → Type Command → jupyter notebook
- It will open Jupyter notebook into a browser like below :
Here, open one new Python project. You need to import Pandas here. So, import the below library:
import pandas as pd
Prepare Data
Series
Series is a special method of the Pandas library. It is like an array, list, or column in a table and creates one-dimensional objects. Below is an example of the code:
- purchase_1 = pd.Series({
- 'Name': 'Chris',
- 'Item Purchased': 'Pencil',
- 'Cost': 22.50
- })
- purchase_2 = pd.Series({
- 'Name': 'Ram',
- 'Item Purchased': 'Book',
- 'Cost': 220.50
- })
- purchase_3 = pd.Series({
- 'Name': 'Mohan',
- 'Item Purchased': 'Pen',
- 'Cost': 22.50
- })
- purchase_4 = pd.Series({
- 'Name': 'Gulam',
- 'Item Purchased': 'Diary',
- 'Cost': 22.50
- })
- df = pd.DataFrame([purchase_1, purchase_2, purchase_3, purchase_4], index = ['Store 1', 'Store 2', 'Store 3', 'Store 4'])
- df.head()
Here, pd.Series will create a tabular structure of data and pd.DataFrame will merge all series and create two-dimensional, size-mutable, potentially heterogeneous tabular data structures with labeled axes.
Press Ctrl + enter key into Jupyter note and see the below output:
Fetch value for Store 1
Press ctrl + enter key into Jupyter note and see the below output:
- Get all data
- for "Item Purchased":
- df['Item Purchased']
Output
- Get the cost of Store 1:
- df.loc['Store 1', 'Cost']
Output
22.5
Show column into Row:
Output
- Get cost data
- for all stores:
- df.T.loc['Cost']
Drop Store 1
Multiply Cost with value 10
Output
Conclusion
Pandas library in Python is very good for data analysis and formation. Also, Jupyter is a very good editor for the writing, execution and displaying of results.
Please find the attached Python code for more details.