ACCESSING DATA IN DATAFRAMES
Now it’s time to play with the data inside these DataFrames. So, in this article, we will learn about selecting and accessing data in DataFrames.
This can be done in many ways, but I would like to divide it into 3 categories of Accessing data through,
Let us understand how to access data in these 3 ways with the below DataFrame as an example,
Accessing data through Values
- It is similar to accessing data in arrays, we mention the index and it fetches the data at that index.
- Consider the above example, we want to fetch how student A performed in Science in 2019- we will use the column ‘Science’ and index ‘2019’; i.e. science[‘2019’] or science[1].
- SYNTAX: <DataFrameObject>. <column_name[row_name]>
- SYNTAX: <DataFrameObject>. <column_name[row_index]>
- Also, we can use attributes ‘at’ and ‘iat’, both of these are also used to fetch the value at a specific index but we mention row and column name/index together here.
- SYNTAX: <DataFrameObject>. <at[row_name,column_name]>
- SYNTAX: <DataFrameObject>. <iat[row_index,column_index]>
- import pandas as pd
- dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74],'French': [95,87,92] }
- df=pd.DataFrame(dict,index=['2018','2019','2020'])
- print(df)
- print('\n')
- print('Method 1:',df.Math['2019'])
- print('Method 2:',df.Science[1])
- print('Method 3:',df.at['2020','Math'])
- print('Method 4:',df.iat[2,1])
OUTPUT
Accessing data through Rows
- You just have to mention the row name which you want to display.
- We use the keyword loc if we want to access any specific row.
SYNTAX
<DataFrameObject>. <loc[row_name,column_name]>
For instance, you want to display the report of student A, in which there must be numbers of all the subjects attained in 2019. So here, the row is “2019” and all the columns are used. We can write,
DataFrameObject.loc['2019',:]
“:” – This means to display all the columns, so according to the above statement it will pick up row- 2019 and all the columns.
- You can access multiple rows at a time as well, suppose you now want to display a report of Student A for all subjects except 2019 and 2020.
- We can write: DataFrameObject.loc['2019':’2020’,:]
- import pandas as pd
- dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }
- df=pd.DataFrame(dict,index=['2018','2019','2020'])
- print(df)
- print('\n')
- print('To access a row:')
- print('\n')
- print(df.loc['2019',:],'\n')
- print('To access multiple rows:')
- print('\n')
- print(df.loc['2019':'2020',:])
OUTPUT
Accessing data through Columns
In the above part, we learned how to display a row-specific record, what is we want to display the same report where we want specific columns as well.· You just have to mention the column name which you want to display.
SYNTAX
<DataFrameObject>. <[column_name]>
For instance, now we want the report of student A for all the years but only for Science subjects. So here we would consider all the rows but the column is “Science”. We can write,
DataFrameObject.[’ Science’]
In case you want to access multiple columns to use,
SYNTAX
<DataFrameObject>. <[column_name1, column_name2]>
SYNTAX
<DataFrameObject>. <loc[: ,column_name1:column_name2]>
- import pandas as pd
- dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }
- df=pd.DataFrame(dict,index=['2018','2019','2020'])
- print(df)
- print('\n')
- print('To access a column:')
- print('\n')
- print(df['Science'],'\n')
- print('To access multiple columns:')
- print('\n')
- print(df.loc[:,'Science':'French'])
OUTPUT
What if we want a specific range of rows and columns as well?
We use the same syntax learned above and mention the specific row and column names which we want to access.
Suppose, we want to display Student A's report for 2018 and 2019 just for English and Science subjects.
- import pandas as pd
- dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }
- df=pd.DataFrame(dict,index=['2018','2019','2020'])
- print(df.loc['2018':'2019','English':'Science'])
OUTPUT
NOTE
When we use “column1:column3”; this does not just mean that it would display column1 and column3, this means it would display all the values from column1 to column3, including column1 and column2. Same for the case of rows.
But after all that, this question arises,
- What if my file is huge and we do not remember row and column names to access them?
- What if the DataFrame object does not contain row or column labels?
- What will you do in the above cases if you want to extract a subset from this kind of DataFrame?
The answer is the Index value. This is done by Slicing DataFrames.
You must be aware of the keyword loc. But now if you want to use index values then keyword to be used is iloc.
- import pandas as pd
- dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }
- df=pd.DataFrame(dict,index=['2018','2019','2020'])
- print(df)
- print('\n')
- print('Using Index value with iloc:')
- print('\n')
- print(df.iloc[0:2,1:3])
OUTPUT
Note
Whenever using ‘iloc’- [2:6]; this means to start from 2 and end at 6-1=5, So this runs from 2 to 5
Summary
In this article, we learned about selecting and accessing data in a dataFrame. We will learn about more things in my series of articles of PANDAS. Practice hard!
In my next article, we will learn about “How to Assign, Add or Modify data in DataFrames”.
Feedback or queries related to this article are most welcome.
Thanks for reading.