The reference actually comes back to numpy. Data frames are essentially index markers on top of a numpy array. Use df.shape() which results a tuple (5, 4). For a two-dimensional matrix, at the 0 index are the number of rows (A,B,C,D,E) and then on the index 1 are columns (W,X,Y,Z); which is why rows are referred to as the 0 axis and columns are referred to as 1 axis because it's directly taken from the shape same as numpy array.
For selecting a particular value, use,
Conditional Selection
A very important feature of pandas is the ability to perform conditional selection using bracket notation and this is going to be very similar to numpy.
Let’s use comparison operator,
Result is a dataframe with boolean values, which returns true if the data frame value at that position is greater than zero and false if it is not greater than zero. See below,
As you can see wherever the value is negative, not satisfying the condition, a NaN has been returned.
Now, what is important is, instead of returning NaN we will return only the rows or columns of a subset of the data frame where the conditions are true.
Let's say we want to grab the data frame where the column value is W>0 and we want to extract Y column. We can also select a set of columns such as Y and X, after applying the condition. See below,
Using multiple conditions
For more than one condition, we can use | or &. Remember that we cannot use python’s and/or here.
- df[(df['W']>0) & (df['Y'] > 1)]
Resetting the index
In order to reset the index back to the default which is 1234....n, we use the method reset_index(). We will get the index, reset to a column and the actual index converted to a numerical. But it will not retain the change if you don’t use inplace=True. Pandas use this inplace argument in many areas, just shift+tab(if using jupyter notebook) and you will get to see it.
Setting a new index
For setting a new index, first, we have to create a new index. We are using the split() method of a string, which is just a common method for splitting off a blank space. It’s a quick way to create a list,
- newind = 'WB MP KA TN UP'.split()
Now, put this list as a column of the dataframe.
If we want to use this State column as the index, we should use,
Note
Unless we retain this information of the index it will overwrite the old index and we won't actually be able to retain this information as a new column. Unlike resets index that allows us to have that new column.
So, that's set index versus reset index.
Here also, inplace=True plays an important role.
Hope, you have enjoyed reading about DataFrames thus far. There's more to come in an upcoming article on DataFrames with something more interesting.
Happy learning!