CREATING DATAFRAMES
In my previous
article, I have introduced you to PANDAS and we also learned what DataFrame and Series are.
Now let’s dig deeper; this is what a DataFrame (Multi-Dimensional Data Structure) looks like:
We would be using the above example throughout the article.
These kinds of DataFrames can be created in various ways using Dictionary, NumPy Array, etc. Here we will learn to create DataFrames by using,
- 2-D Dictionary
- 2-D NumPy Array
- Series Type Object
- DataFrame Object
Creating DataFrame from 2-D Dictionary
We must know that Dictionary is a collection of key: value pairs that are mutable. Refer to my article about
Dictionaries in Python.
a) 2-D Dictionary contains values as list/ndArray
(To understand Lists, refer to my
article.)
Let us consider the above example, “Report of Student A” and try to create that DataFrame.
- import pandas as pd
-
- dict= { '2018':[85,73,80,64], '2019':[60,80,58,96], '2020':[90,64,74,87] }
-
- df=pd.DataFrame(dict)
- print(df)
OUTPUT
Now here we got the indexes as 0,1,2…; this is because we didn’t mention any index, and these were set by default. If we wish to change the indexes and want to create the exact same DataFrame then we use,
DataFrame(dictionary_object, index=[‘mention’,’index’,'you’,’want’])
- import pandas as pd
-
- dict= { '2018':[85,73,80,64], '2019':[60,80,58,96], '2020':[90,64,74,87] }
-
- df=pd.DataFrame(dict,index=['English','Math','Science','French'])
- print(df)
OUTPUT
NOTE
Index Value must be the same as the length of the rows, otherwise it generates an error. In case we use:
df=pd. DataFrame (dict, index=['English','Math','Science'])
We get:
b) 2-D Dictionary contains values as Dictionary
We use the concept of Nested Dictionary in this case.
- import pandas as pd
-
- report= { '2018':{'English':85,'Math':73,'Science':80,'French':64},
- '2019':{'English':60,'Math':80,'Science':58,'French':96},
- '2020':{'English':90,'Math':64,'Science':74,'French':87}
- }
-
- df=pd.DataFrame(report)
- print(df)
There is another way in which you can create a nested dictionary to form a DataFrame,
- import pandas as pd
-
- year2018={'English':85,'Math':73,'Science':80,'French':64}
- year2019={'English':60,'Math':80,'Science':58,'French':96}
- year2020={'English':90,'Math':64,'Science':74,'French':87}
-
- report={'2018':year2018,'2019':year2019,'2020':year2020}
- df=pd.DataFrame(report)
- print(df)
NOTE
In Nested Dictionary, sometimes we get confused within the inner and outer keys. So, Columns- Outer Dictionary Keys and Rows- Inner Dictionary Keys.
So, in the above example, 2018,2019,2020 are Columns hence the Outer Dictionary Keys and 'English','Math','Science','French' are Rows hence the Inner Dictionary Keys.
Creating DataFrame from 2-D ndArray(NumPy Array)
(To know more about NumPy, Refer to my article about
NumPy Array)
- import pandas as pd
- import numpy as np
-
- arr= np.array([(85,60,90),(73,80,64),(98,58,74),(88,96,87)])
- df=pd.DataFrame(arr)
- print(df)
OUTPUT
By Default, the column and row index both are set to 0,1,2,3…. Depending upon the number of rows and columns. If you wish to name them, then we can use,
df=pd.DataFrame(arr, columns=['2018','2019','2020'], index=['English','Math','Science','French'])
Creating DataFrame from Series Object
(To know more about Series, Refer to my
article)
To create a DataFrame from a Series Object we need to go through 2 steps,
a) First, we create series.
- import pandas as pd
-
- student= pd.Series(['A','B','C'])
- print(student)
OUTPUT
b) Then, we convert this series into dictionary to form a DataFrame.
- import pandas as pd
-
- stud= pd.Series(['A','B','C'],index=[1,2,3])
- dict={'Student':stud}
-
- df=pd.DataFrame(dict)
- print(df)
OUTPUT
Now, if we want to create the DataFrame as first example,
- First, we have to create a series, as we notice that we need 3 columns, so we have to create 3 series with index as their subjects.
- Then we need to convert the series into Dictionary with column titles of 2018,2019,2020.
- import pandas as pd
-
- year1= pd.Series([85,73,80,64],index=['English','Math','Science','French'])
- year2= pd.Series([60,80,58,96],index=['English','Math','Science','French'])
- year3= pd.Series([90,64,74,87],index=['English','Math','Science','French'])
-
- dict={'2018':year1, '2019':year2, '2020':year3 }
-
- df=pd.DataFrame(dict)
- print(df)
Creating DataFrame another DataFrame
This means that we do not have to go through the whole procedure of making a DataFrame to create a new one. We can simply cast the old DataFrame into new one. This would be helpful in case we want 2 similar DataFrames.
- import pandas as pd
-
- year1= pd.Series([85,73,80,64],index=['English','Math','Science','French'])
- year2= pd.Series([60,80,58,96],index=['English','Math','Science','French'])
- year3= pd.Series([90,64,74,87],index=['English','Math','Science','French'])
-
- dict={'2018':year1, '2019':year2, '2020':year3 }
-
- df=pd.DataFrame(dict)
- print(df)
-
- df2=pd.DataFrame(df)
- print(df2)
OUTPUT
Summary
In this article, we learned how to create dataFrame using different techniques. Now you know basic
Pandas, Series and DataFrame.
In my next article, we will learn about “DataFrames Attributes". Until then practice and try to create different dataFrames using different techniques.
Feedback or queries related to this article are most welcome.
Thanks for reading.