Correlation is a statistical measure that shows the degree of relationship between two variables. Karl Pearson’s coefficient of correlation is one of the most widely used methods for determining the strength and direction of this relationship. It plays an important role in statistics, data science, and also in competitive exams like JEE Main, JEE Advanced, and other entrance tests.
Formula:
Where,
This is the standard method using raw data.
Use this method when numbers are small and simple.
When data values are large, calculations can be simplified by taking deviations from assumed means.
Let:
Where are assumed means.
Formula:
This reduces calculation effort.
When data is very large or has a common difference (like in grouped data), divide deviations by a factor.
Formula becomes:
This method is useful for grouped frequency distributions.
Another approach is to use:
Very common in statistics and probability questions.
So, the main methods are:
Method 1 — Direct (raw-sum) formula
Example 1 (Direct method)
Data: (X,Y) = (1, 2), (2, 3), (3, 5), (4, 4). Compute r.
Solution
Compute sums:
Use raw-sum formula
Numerator:
Denominator: .
So
Method 2 — Assumed-mean (shortcut) method
Example 2 (Assumed mean)
Data: X = {98, 102, 100, 105, 95}, Y = {200, 210, 205, 215, 195}.
Take assumed means .
Solution
Compute deviations :
Now
Compute denominator: . Hence
(very strong positive linear association).
Note: assumed-mean reduces arithmetic — you only work with small deviations.
Method 3 — Step-deviation / coding method (useful when data have constant step)
Example 3 (Step-deviation)
Data: X = {100, 110, 120, 130}, Y = {45, 50, 60, 65}.
Use h = 10. Let
Solution
Compute coded deviations:
Then
Because we scaled both variables by constants, r is unchanged — coding just simplifies arithmetic.
Method 4 — Covariance and standard deviations (conceptual form)
Example 4 (Covariance method, same data as Example 1)
Data: (1, 2), (2, 3), (3, 5), (4, 4). We already found r = 0.8. Show via covariance.
Solution
Means: .
Compute sums of deviations:
Population covariance:
Population standard deviations:
Thus
same result as Example 1.
Example 1: Find Karl Pearson’s coefficient of correlation for the following data:
Solution:
Here, Y = 2X.
Clearly, the relation is perfectly linear.
Thus, r = +1.
Example 2: Given the following data, calculate Karl Pearson’s coefficient of correlation:
Solution:
Step 1: Compute mean of X and Y.
Step 2: Find deviations
.
Step 3: Apply formula:
After calculations:
So, there is a strong positive correlation.
Example 3: If the correlation coefficient between X and Y is r = 0.6, standard deviation of X = 3, standard deviation of Y = 4, and covariance of (X, Y) = ?
Solution:
We know,
Example 4: For the data X = {1, 2, 3}, Y = {6, 4, 2}, find r.
Solution:
Here Y = -2X + 8 (perfect linear with negative slope). Hence correlation is perfect negative:
r=−1.
Example 5: For data (X, Y): (1, 2), (2, 3), (3, 5), (4, 4). Compute Karl Pearson’s r.
Solution:
Compute sums (careful arithmetic):
n = 4
Use formula
Numerator:
Denominator:
So r = 16/20 = 0.8.
r = 0.8 — strong positive linear correlation.
Example 6: Paired grouped data (midpoints used as scores): .
Compute r treating as weights.
Solution:
First compute weighted sums (N = total frequency = 4 + 6 + 5 = 15).
= 1000 + 5400 + 8750 = 15150.
= 1000 + 5400 + 8750 = 15150.
Now use weighted form:
Numerator:
First variance term:
Second variance term:
Denominator: .
Hence r = 13400/13400 = 1.
r = 1 — perfect positive linear correlation (weighted).
Example 7: Given regression coefficients (regression of Y on X) and (regression of X on Y), find r.
Solution:
Property: Sign of r equals sign of both coefficients (if both positive → r > 0).
Compute
So
r = 0.6.
Example 8: If r = -0.5, find Cov(X, Y).
Solution:
Example 9: For a sample of n = 10 pairs you observe r = 0.8. Test significance at 5% level (two-tailed). Use .
Solution:
Degrees of freedom = n – 2 = 8. Compute
Critical two-tailed , the correlation is statistically significant at 5% level.
Answer hint: use raw-sum formula.
Answer hint: apply the raw-sum formula.
(Session 2025 - 26)