The Central Limit Theorem (CLT) is one of the most important results in statistics. It states that when we take sufficiently large random samples from any population with a finite mean and variance, the sampling distribution of the sample mean will be approximately normally distributed, regardless of the shape of the original population.
In simple words, even if your original data is skewed or irregular, the distribution of the sample means tends toward a bell curve as the sample size increases.
Statement:
If a random sample of size n is taken from a population with mean and standard deviation , then as n becomes large, the distribution of the sample mean approaches a normal distribution with mean and standard deviation .
The formula for the central limit theorem for the sample mean is:
Where:
The probability density function for the sampling distribution of the mean (as n grows large) is given by:
This is the normal distribution equation adapted for sample means.
The CLT works because when independent random variables are added, their normalized sum tends to follow a normal distribution, even if the original variables themselves are not normally distributed. This is why normal distribution appears so often in real-world data analysis.
Example:
Suppose the average height of students in a school is unknown, but the heights are skewed. You randomly select samples of 50 students at a time and record the average height for each sample. If you repeat this many times, the histogram of those sample averages will form an approximate normal curve, even though the original height distribution was skewed.
The application of the central limit theorem is widespread in statistics and data science:
Example 1: Population mean , population standard deviation . A random sample of n = 36 is taken. Find .
Solution.
From normal table: .
Answer: 0.0668 (approx.)
Example 2: True proportion p=0.40. Sample size n = 200. Find .
Solution.
Answer: 0.8511 (approx.)
Example 3: Population standard deviation . How large a sample n is needed so that ?
Solution.
Answer: n = 139
Example 4: Let be good with mean 22 and variance 99. What is approximately where ?
Solution.
Answer: 0.1587
Answer hint: .
Compute z's, then probabilities.
Final answer: ≈ 0.8186.
Answer hint:
Final answer: ≈ 0.067(approx).
Answer hint: Compute
Final answer: .
Answer hint: Mean sum = 400 * 0.5 = 200, sd . z = (210-200)/10 =1. So prob ≈ 0.1587 (upper tail) so P( <210 ) ≈ 0.8413.
Answer: n = 10: approximation may be poor (skew shows). N = 100: CLT will give a good approximation.
(Session 2025 - 26)