There are many kinds of correlation coefficients, but Pearson’s correlation coefficient is the most popular. It is used in linear regression. It is also used to measure the relationship between two variables. The value of a correlation coefficient is always between -1 to 1.
Where
- 1 indicates a strong positive relationship
- -1 indicated a strong negative relationship
- 0 indicates no relationship between two values
The below graph images will help you to understand the positive, negative, and no correlation.
How to calculate the correlation coefficient?
To find the correlation coefficient you need to add a quick measure (see https://www.c-sharpcorner.com/article/what-is-quick-measure-in-power-bi/). First, we need to import the table so that we can add a quick measure to it.
I am going to import data from the SQL server (see here).
Go to the home tab click on the get data drop down and then click on ‘sql server’.
A new window will be opened, provide your server name and database name and connect your SQL server by clicking on ‘OK’.
Once you connect your server a navigator window will be opened. Select your table and click on load. Your table will be loaded.
Loaded table will be shown at the right side of the tool.
Now go to the home tab and click on ‘Quick Measure’. A new quick-measure window will be opened.
Click on the drop-down menu of ‘Select a calculation’ and go to ‘Mathematical Operations’ and click on ‘Correlation coefficient’.
Calculate the correlation coefficient between two values over the category. You have to provide three data fields:
- Category: Category to find the correlation over
- Measure X: The first measure to find the correlation between
- Measure: The second measure to find the correlation between
Click on ‘OK’ to calculate the correlation coefficient. After clicking on ‘Ok’ a new measure ‘OrderQty and UnitPrice correlation for ProductID’ will be created in the table. The background calculation will be shown while you click on this newly added quick measure
OrderQty and UnitPrice correlation for ProductID =
VAR __CORRELATION_TABLE = VALUES('Sales SalesOrderDetail'[ProductID])
VAR __COUNT =
COUNTX(
KEEPFILTERS(__CORRELATION_TABLE),
CALCULATE(
SUM('Sales SalesOrderDetail'[OrderQty])
* SUM('Sales SalesOrderDetail'[UnitPrice])
)
)
VAR __SUM_X =
SUMX(
KEEPFILTERS(__CORRELATION_TABLE),
CALCULATE(SUM('Sales SalesOrderDetail'[OrderQty]))
)
VAR __SUM_Y =
SUMX(
KEEPFILTERS(__CORRELATION_TABLE),
CALCULATE(SUM('Sales SalesOrderDetail'[UnitPrice]))
)
VAR __SUM_XY =
SUMX(
KEEPFILTERS(__CORRELATION_TABLE),
CALCULATE(
SUM('Sales SalesOrderDetail'[OrderQty])
* SUM('Sales SalesOrderDetail'[UnitPrice]) * 1.
)
)
VAR __SUM_X2 =
SUMX(
KEEPFILTERS(__CORRELATION_TABLE),
CALCULATE(SUM('Sales SalesOrderDetail'[OrderQty]) ^ 2)
)
VAR __SUM_Y2 =
SUMX(
KEEPFILTERS(__CORRELATION_TABLE),
CALCULATE(SUM('Sales SalesOrderDetail'[UnitPrice]) ^ 2)
)
RETURN
DIVIDE(
__COUNT * __SUM_XY - __SUM_X * __SUM_Y * 1.,
SQRT(
(__COUNT * __SUM_X2 - __SUM_X ^ 2)
* (__COUNT * __SUM_Y2 - __SUM_Y ^ 2)
)
)
To check the value of this correlation coefficient, select a ‘Card’ visual from the visualization panel and select this newly added measure. You can see that the value of the correlation coefficient lies between -1 to 1. See the below image,
Summary
This is how you can calculate the correlation coefficient between two values. I hope you understand how to find the correlation coefficient. This is also very useful for linear regression. I will write a separate article on linear regression. So, stay tuned. Thanks for reading and have a great day.