Operations On Dataframe - Part Two

Aashina Arora
5y
5.3k
0
5

Article

Introduction

So far, we have learned many concepts in Pandas:

In my previous article, which was Part 1 of operations on DataFrame, we learned about a few operations, such as Binary Operations, Inspection Functions, Retrieve Head and Tail Rows, and Iteration.

Now we are left with Combining DataFrames and Aggregation Functions. So, let’s study them in-depth now:

Combining DataFrames

We learned about various combining methods:

combine_first()
concat()
merge()

combine_first()

As the name suggests -combine_first, it combines 2 dataFrames using the Patching method.

If in a dataFrame, a certain value is missing and the corresponding cell of another dataFrame has valid data, then Patching comes into the role. It patches the value in the first dataFrame; Hence combine_first.

SYNTAX

dataFrame1.combine_first(dataFrame2)

import pandas as pd
dict1= {'A':[85,73,None], 'B':[60,None,58], 'C':[90,60,74], 'D': [95,87,92] }
df1=pd.DataFrame(dict1,index=['0','1','2'])
print("This is df1:")
print(df1)
print('\n')
dict2= {'A':[85,73,98], 'B':[60,80,58], 'C':[None,60,74], 'D': [95,87,None] }
df2=pd.DataFrame(dict2,index=['0','1','2'])
print("This is df2:")
print(df2)
print('\n')
print("Using combine_first()")
print("Combined Dataframe:")
print(df1.combine_first(df2))

OUTPUT

concat()

This method concatenates the dataFrames irrespective of its values, even if the values are missing it concatenates the dataFrames and gives the output as a single dataFrame.

SYNTAX

concat([dataFrame1,dataFrame2])

import pandas as pd
dict1= {'A':[85,73,None], 'B':[60,None,58], 'C':[90,60,74], 'D': [95,87,92] }
df1=pd.DataFrame(dict1,index=['0','1','2'])
print("This is df1:")
print(df1)
print('\n')
dict2= {'A':[85,73,98], 'B':[60,80,58], 'C':[None,60,74], 'D': [95,87,None] }
df2=pd.DataFrame(dict2,index=['0','1','2'])
print("This is df2:")
print(df2)
print('\n')
print("Using concat()")
print("Combined Dataframe:")
print(pd.concat([df1,df2]))

OUTPUT

merge()

If two different dataFrames have a common column (same concept as the foreign key in Database), then if we merge those two DataFrames we get all columns of dataFrame1 and dataFrame2 in a single dataFrame as an output.

If there is no common column in 2 dataFrames, then merge() cannot be used, it gives an error.

SYNTAX

merge[dataFrame1,dataFrame2)

import pandas as pd
dict1= {'ID':[85,73,99], 'Name':["Alpha","Beta","Gamma"], 'Course':[909,606,741], 'Marks': [95,87,92] }
df1=pd.DataFrame(dict1,index=['0','1','2'])
print("This is df1:")
print(df1)
print('\n')
dict2= {'ID':[85,73,99], 'Course_ID':[6000,8000,5800] }
df2=pd.DataFrame(dict2,index=['0','1','2'])
print("This is df2:")
print(df2)
print('\n')
print("Using merge()")
print("Combined Dataframe:")
print(pd.merge(df1,df2))

OUTPUT

Aggregation Functions

Now we will learn about the various aggregation functions:

min()
max()
count()
sum()
mad()
quantile()

min()

As the name suggests, this method fetches out the minimum values within a dataFrame.
It contains 3 parameters:

1. axis- It can take either 0 or 1. ‘0’ stand for column-wise separation and ‘1’ for row-wise. If you don’t mention any axis, default is ‘0’.
2. skipna- It can be either ‘true’ or ‘false’. If you don’t want to skip NA values then set skipna=’false’. Default value is ‘true’.
3. numeric_only - It can be either ‘true’ or ‘false’. If you don’t separation with only numeric values then set numeric_only=’false’. Default value is ‘true’
If you want column-wise minimum values, use: dataFrame.min(axis=1)ordataFrame.min(axis=’columns’)
If you want row-wise minimum values, use: dataFrame.min()ordataFrame.min(axis=0)ordataFrame.min(axis=’index’)

SYNTAX

dataFrame.min(axis=0, skipna=’true’,numeric_only=’true’)

import pandas as pd
dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }
df=pd.DataFrame(dict,index=['2018','2019','2020'])
print(df)
print('\n')
print("Minimun row wise:")
print(df.min())
print('\n')
print("Minimun column wise:")
print(df.min(axis=1))

OUTPUT

max()

As the name suggests, this method fetches out the maximum values within a dataFrame.
It also contains 3 parameters(same as min())

SYNTAX
dataFrame.max(axis=0, skipna=’true’,numeric_only=’true’)
If you want column-wise maximum values, use: dataFrame.max(axis=1)ordataFrame.min(axis=’columns’)
If you want row-wise maximum values, use: dataFrame.max(axis=0)ordataFrame.max() ordataFrame.max(axis=’index’)
1. import pandas as pd
3. dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }
5. df=pd.DataFrame(dict,index=['2018','2019','2020'])
7. print(df)
8. print('\n')
10. print("Maximum row wise:")
11. print(df.max())
12. print('\n')
14. print("Maximum column wise:")
15. print(df.max(axis=1))

OUTPUT

count()

As the name suggests, this method fetches the count of entries in the dataFrame.
It also contains 3 parameters (same as min())

SYNTAX
dataFrame.count(axis=0, skipna=’true’,numeric_only=’true’)
If you want column-wise count, use: dataFrame.count(1)ordataFrame.count(axis=’columns’)
If you want row-wise count, use: dataFrame.count(0)ordataFrame.count()ordataFrame.count(axis=’index’)
1. import pandas as pd
3. dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }
5. df=pd.DataFrame(dict,index=['2018','2019','2020'])
6. print(df)
7. print('\n')
9. print("Row-wise count:")
10. print(df.count())
11. print('\n')
13. print("Column-wise count:")
14. print(df.count(1))

OUTPUT

sum()

As the name suggests, this method fetches the sum of entries in the dataFrame.
It also contains 3 parameters (same as min())

SYNTAX
dataFrame.sum(axis=0, skipna=’true’,numeric_only=’true’)
If you want column-wise sum, use: dataFrame.sum(1)ordataFrame.sum(axis=’columns’)
If you want row-wise sum, use: dataFrame.sum(0)ordataFrame.sum()ordataFrame.sum(axis=’index’)
1. import pandas as pd
3. dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }
5. df=pd.DataFrame(dict,index=['2018','2019','2020'])
6. print(df)
7. print('\n')
9. print("Row-wise sum:")
10. print(df.sum())
11. print('\n')
13. print("Column-wise sum:")
14. print(df.sum(1))

OUTPUT

mad()

MAD stand for Mean Absolute Deviation.
It also has 3 parameters (axis, skipna, numeric_only), you can refer to min() for details of these parameters.

SYNTAX
dataFrame.mad(axis=0, skipna=’true’,numeric_only=’true’)
If you want column-wise mean absolute deviation, use: dataFrame.mad(axis=1)ordataFrame.mad(axis=’columns’)
If you want row-wise mean absolute deviation, use: dataFrame.mad(axis=0)ordataFrame.mad()ordataFrame.mad(axis=’index’)
1. import pandas as pd
3. dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }
5. df=pd.DataFrame(dict,index=['2018','2019','2020'])
6. print(df)
7. print('\n')
9. print("Row-wise:")
10. print(df.mad())
11. print('\n')
13. print("Column-wise:")
14. print(df.mad(1))

OUTPUT

quantile()

Quantiles are the points in a distribution that relate to the rank order of values in that distribution.
In short, the range of probability distribution into continuous intervals with equal probabilities.
The quantile of the value is the fraction of observations less than or equal to the values.
quantile() has 4 parameters:

q- This parameter determines the percentage of quantile to be done to a dataFrame.
axis(0 or 1)
numeric_only(true or false)
interpolation – It is the construction of new data points within the range of a discrete set of known data points.

q is a parameter with range [0.25,0.75,0.5,1.0], the default value is 0.5(50% of quantile). And remember that 0 < q <= 1
If q is an array, then a dataFrame is returned.
If q is float, then a series is returned.
interpolationparameter can be used with these values: [‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’]

SYNTAX
dataFrame.quantile(q=0.5, axis=0, numeric_only=’true’, interpolation=’linear’)
If you want column-wise quantile, use: dataFrame.quantile(axis=1) or dataFrame.quantile(axis=’columns’)
If you want row-wise mean absolute deviation, use: dataFrame.quantile(axis=0) or dataFrame.quantile() or
dataFrame.quantile(axis=’index’)
1. import pandas as pd
3. dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }
5. df=pd.DataFrame(dict,index=['2018','2019','2020'])
6. print(df)
7. print('\n')
9. print("Row-wise:")
10. print(df.quantile())
11. print('\n')
13. print("Column-wise:")
14. print(df.quantile(1))
15. print('\n')
17. print("Quantile with 0.25 value(Row-wise):")
18. print(df.quantile(q=0.25,axis=0,numeric_only='true'))

OUTPUT

Summary

In this article, we discussed various methods of Combining Dataframes, and we also learned about Aggregation Functions. I hope this will help readers understand how to use and implement Pandas. Now you can play with data, create your own dataframes, and start analyses.

This is the end of my PANDAS Series.

Feedback or queries related to this article are most welcome.

Thanks for reading!!