In the world of statistics, correlation coefficient is an essential measure used to describe the relationship between two variables. Whether you’re a student of statistics, a data analyst, or a researcher, understanding the correlation coefficient can help you gain insights into how variables interact. In this section, we’ll explore what the correlation coefficient is, how it’s calculated, its types, and how to interpret it.
A correlation coefficient is a statistical measure that describes the strength and direction of the relationship between two variables. It tells us whether the variables tend to increase or decrease together and the degree of this association. The correlation coefficient can vary from -1 to +1:
Pearson's correlation coefficient is the most widely used measure of correlation, but there are others, such as Spearman’s rank correlation, which we will also discuss.
Pearson’s correlation coefficient (r) is the most widely used measure of correlation. It is calculated using the following formula:
Where:
This formula computes the covariance of the variables divided by the product of their standard deviations. The result, r, tells you how strongly the variables X and Y are related.
The coefficient of determination, denoted as , is closely related to the correlation coefficient. It represents the proportion of the variance in one variable that can be explained by the other variable. If r = 0.8, then , meaning that 64% of the variance in one variable can be explained by the other. The coefficient of determination is particularly useful when analyzing linear regression models.
The value of the correlation coefficient indicates the strength and direction of the relationship:
In general:
Let’s say you want to determine the correlation coefficient between the number of hours studied (X) and the scores on a test (Y). Using the data points, you can calculate the Pearson’s correlation coefficient to see if there’s a strong or weak relationship between these two variables.
Example data:
X = [1, 2, 3, 4, 5], Y = [2, 4, 5, 4, 5]
Calculating the correlation coefficient for this data would give you a numerical value that tells you the degree to which hours studied is related to test scores.
While Pearson’s correlation measures linear relationships, Spearman’s rank correlation coefficient (ρ) measures the strength and direction of a monotonic relationship between two variables. It is used when the data is not linear or when the variables are ordinal (ranked).
The formula for Spearman’s rank correlation is:
Where:
Spearman’s rank correlation coefficient also ranges from –1 to +1, with the same interpretation as Pearson’s.
There are several types of correlation coefficients, depending on the nature of the data:
To determine the correlation coefficient:
Example 1: Given the following data for two variables, X and Y, calculate the Pearson correlation coefficient.
Solution:
Step 1: Calculate the necessary sums.
Step 2: Apply the Pearson formula.
The formula for the Pearson correlation coefficient is:
Where:
Step 3: Plug in the values into the formula.
Simplify the numerator:
Step 4: Interpretation
The correlation coefficient r ≈ 0.994 indicates a very strong positive linear relationship between variables X and Y.
Example 2: Calculate the correlation coefficient for the following height (in inches) of the father (x) and their son (y).
We know that
= 4635 – 4692 = 3
= 2.121
= 2.345
Then
(Session 2025 - 26)