DataFrames in Python

Introduction

 
As we know, Python also supports Data Structure. For new and beginners, let’s first discuss what Data Structure is. A data structure is basically a way of storing data in such a way that it can be easily accessed and worked with, like,
  • Storing data in a way so that we can quickly access the last item, we create a STACK  (LIFO).
  • Storing data in a way so that we can quickly access the first item, we create a QUEUE (FIFO).
In a similar fashion, Python also provides some data structures.
  • Series
  • Data Frame
  • Pandas
In this article, we will know about DataFrame. 
 

Data Frame

  • It is a 2-Dimensional labeled array, which stores ordered collection columns that can store data of different types.
  • It has two indexes or we can say two axes - a row index and a column index.
  • Data Frame is “Value-Mutable” and “Size-Mutable”, i.e., we can change the value as well as the size.
Let’s start with creating DataFrames.
 
There are a lot of ways for creating DataFrames, like,
 
Dataframe in Python by Mahesh Verma
 

Creating DataFrame from 2-D Dictionary

 
Creating DataFrame using 2-D Dictionary contains values as a list.
  1. import pandas as pd  
  2.   
  3. dictObj={   
  4.    ‘EmpCode’ : [‘E01’,’E02’,’E03’,’E04’],  
  5.                   ‘EmpName’ : [‘Raj’,’Raman’,’Rahul’,’Rohit’],  
  6.                   ‘EmpDept’ : [‘HR’,’Accounts’,’IT’,’HR’]  
  7.                 }  
  8.   
  9. df=pd.DataFrame(dictObj)  
  10. print(df)  
The output of the above code is mentioned below.
 
Creating DataFrame by Dictionary
 
As we can see in the output, it generates the index and keys of 2-D dictionary (which become columns).
 
We can also change the index value by passing the index in DataFrame(), like
  1. df=pd.DataFrame(dictObj, index=[‘I’,’II’,’III’,’IV’])  
Change Index in DataFrame
 
Note
 
Index value must be the same length of rows, otherwise, it generates an error.
 
Creating DataFrame using 2-D Dictionary contains values as Dictionary or Nested Dictionary,
  1. import pandas as pd  
  2. yr2018 = {‘NoOfArticles’:1200, ‘NoOfBlogs’:1000, ‘NoOfNews’:700}  
  3. yr2019 = {‘NoOfArticles’:1500, ‘NoOfBlogs’:1500, ‘NoOfNews’:900}  
  4. yr2020 = {‘NoOfArticles’:2000, ‘NoOfBlogs’:1800, ‘NoOfNews’:1000}  
  5.   
  6. Published = {2018:yr2018, 2019:yr2019, 2020:yr2020}  
  7. df = pd.DataFrame(Published)  
  8. print(df)  
In the above line of code, first, we created 3 dictionaries - yr2018, yr2019 and yr2020. After that, we created a “Published” dictionary which contains other dictionaries. We can also create the above dictionary like below.
  1. Published = {      
  2.     2018 = {‘NoOfArticles’:1200, ‘NoOfBlogs’:1000, ‘NoOfNews’:700},      
  3.     2019 = {‘NoOfArticles’:1500, ‘NoOfBlogs’:1500, ‘NoOfNews’:900},      
  4.     2020 = {‘NoOfArticles’:2000, ‘NoOfBlogs’:1800, ‘NoOfNews’:1000}      
  5. }      
  6. df = pd.DataFrame(Published)      
  7. print(df)  
 
While creating a DataFrame with a 2-D nested dictionary - 
 
Columns: outer dictionary keys
Rows: inner dictionary keys.
 
See the output,
 
Creating DataFrame by Nested Dictionary
 

Creating DataFrame from the 2-D array (Numpy Array)

  1. import numpy as np  
  2. import pandas as pd  
  3.   
  4. arr=([[11,12,13],[14,15,16],[17,18,19],[20,21,22]])  
  5. df=pd.DataFrame(arr)  
  6. print(df)  
Creating DataFrame by NumPy Array 
 
As we can see, the output that it automatically gives row indexes and column indexes which started from 0. We can also change column name and row name like,
  1. df=pd.DataFrame(arr,columns=[‘One’,’Two’,’Three’], index=[‘I’,’II’,’III’,’IV’])  
See the output after executing above command,
 
Change Row and Column Name in DataFrame
 
Note
 
If number of elements in each row different, then Python will create just single column in the dataframe object and the type of column will be consider as Object, like,
  1. import numpy as np  
  2. import pandas as pd  
  3. arr=np.array([[2,3],[7,8,9],[3,6,5]])  
  4. df=pd.DataFrame(arr)  
  5. print(df)  
DataFrame
 

Creating DataFrame from Series Object

 
Series is a Pandas data structure that represents a 1-Dimenisonal Array like object containing an array of data and an associated array of data labels, called index. We can also create DataFrame from it. Like,
  1. import pandas as pd  
  2. Students = pd.Series([‘Raj’,’Raman’,’Rahul’], index=[1,2,3])  
  3. Marks=pd.Series([75,89,90], index=[1,2,3])  
  4. Contact=pd.Series([‘9899’,’9560’,’9871’], index=[1,2,3])  
  5.   
  6. Dict={ Stud:Students, MM:Marks, Phone:Contact}  
  7.   
  8. df=pd.DataFrame(dict)  
  9. print(df)  
Creating DataFrame by Series Object
 
Note
 
Index must be same for all Series.
  

Creating DataFrame from another DataFrame

 
We can also create a new DataFrame by existing DataFrame. Like
  1. df2=pd.DataFrame(df)  
  2.   
  3. print(df2)  
Creating DataFrame by DataFrame
 

Conclusion

 
Now, we have learned about DataFrames in python and how we can create it. After reading this article, I hope we are able to create DataFrame in python. 
 
All the queries related to this article and sample files are always welcome. Thanks for reading.!!!


Recommended Free Ebook
Similar Articles