Chi-Square Test
The Chi-Square Test is a statistical technique used to assess whether there is a meaningful relationship between categorical variables. It compares observed data with expected data based on a hypothesis to check for independence or goodness of fit. Widely used in fields like business, science, and social research, the test helps analyze relationships in contingency tables. The calculated Chi-Square value is compared with a critical value to accept or reject the null hypothesis.
1.0What is the Chi-Square Test?
Chi-Square Test Definition:
The Chi-Square Test is a statistical method to determine whether there is a significant relationship between two categorical variables by comparing observed and expected frequencies.
χ2=∑Ei(Oi−Ei)2
Where:
- Oi = Observed frequency
- Ei = Expected frequency
2.0About the Chi-Square Test
The chi-square test helps answer questions such as:
- Are two variables independent?
- Is the observed distribution close to the expected one?
It is mainly of two types:
- Chi-Square Test for Independence – Checks if two variables are related.
- Chi-Square Goodness of Fit Test – Checks how well observed data fit a theoretical distribution.
3.0Calculation of the Chi-Square Test
Step-by-Step Process:
- Step 1: Set up hypotheses.
- H0: Variables are independent.
- H1: Variables are dependent.
- Step 2: Prepare the contingency table (Observed frequencies).
- Step 3: Calculate Expected frequencies using:
Eij=Grand Total(Row Total)(Column Total)
- Step 4: Compute χ2 :
χ2=∑E(O−E)2
- Step 5: Compare with critical chi-square value from the chi-square distribution table at a given significance level (e.g., 0.05).
4.0Example of Chi-Square Test Explained
Example: A survey records the choice of drink by 100 people:
Solution:
Step 1: Calculate Expected Frequencies:
E(Male,Tea)=10050×50=25
Similarly:
- E(Female,Tea) = 25
- E(Male,Coffee) = 25
- E(Female,Coffee) = 25
Step 2: Compute /chi2:
χ2=25(30−25)2+25(20−25)2+25(20−25)2+25(30−25)2χ2=2525+25+25+25=4
Step 3: Interpretation:
At 1 degree of freedom (df = (2-1)(2-1) =1) and significance level 0.05, the critical value is 3.841.
Since χ2=4>3.841, we reject H_0.
Conclusion: There is a significant association between gender and drink choice.
5.0Bayesian Chi-Square Test
An advanced method that incorporates prior beliefs into the chi-square analysis. It is useful when sample sizes are small or data is uncertain. Bayesian methods provide a probabilistic interpretation of the relationship between variables.
6.0Chi-Square Test A/B Test in Business Example
In business, A/B testing uses chi-square to compare two marketing strategies. Example:
Using the chi-square test, businesses can analyze if the difference in performance between Strategy A and B is statistically significant.
7.0Importance of Chi-Square Test
- Helps in decision making using categorical data
- Useful in survey analysis and market research
- Validates hypotheses in experimental research
- Crucial in quality control and business strategies
8.0Solved Example on Chi-Square Test
Example 1: In a survey, 200 people were asked about their preference for three different products: A, B, and C. The results are shown below:
Assuming all products are equally popular, test the hypothesis at a 5% significance level.
Solution:
Step 1 – Null Hypothesis H_0:
All products are equally preferred. So, expected frequency E for each = \frac{200}{3} \approx 66.67.
Step 2 – Calculate /chi2:
=2.67+0.17+4.17=7.01χ2=2.67+0.17+4.17=7.01
Step 3 – Degrees of Freedom (df):
df = n - 1 = 3 - 1 = 2
From the chi-square table, the critical value at 5% significance level and df = 2 is 5.991.
Since 7.01 > 5.991, we reject H0H_0.
Conclusion: The product preferences are not equally distributed.
Example 2: A company studied the relationship between customer gender and their product choice. The data is:
Test whether product choice is independent of gender at 5% significance level.
Solution:
Step 1 – Compute Expected Frequencies:
Step 2 – Calculate χ2 :
χ2=35(40−35)2+65(60−65)2+35(30−35)2+65(70−65)2
χ2=3525+6525+3525+6525=3550+6550
χ2=1.428+0.769=2.197
Step 3 – Degrees of Freedom (df):
df = (rows -1)(columns -1) = (2 -1)(2 -1) = 1
Critical chi-square value at 5% significance level and df =1 is 3.841.
Since 2.197 < 3.841, we accept H_0.
Conclusion: Product choice is independent of gender.
Example 3: A manufacturer claims that the proportion of defective items in a batch is 5%. From a sample of 200 items, 15 defective items were found. Test the manufacturer’s claim at a 5% significance level.
Solution:
Step 1 – Null Hypothesis H0:
The proportion of defective items is 5%.
Expected defective items:
E=0.05×200=10
Expected non-defective items:
E' = 200 - 10 = 190
Observed (O):
- Defective: 15
- Non-defective: 185
Step 2 – Calculate χ2 :
χ2=10(15−10)2+190(185−190)2=1025+19025
χ2=2.5+0.1316=2.6316
Step 3 – Degrees of Freedom (df):
df = number of categories – 1 = 2 – 1 = 1
Critical value at 5% significance level and df = 1 is 3.841.
Since 2.6316<3.8412.6316 < 3.841, we accept H0.
Conclusion: The manufacturer’s claim holds.
Example 4: A study records the following data regarding study method and exam success:
Is there an association between study method and exam success at a 1% significance level?
Solution:
Step 1 – Expected Frequencies:
Step 2 – Compute χ2 :
χ2=65(80−65)2+35(20−35)2+65(50−65)2+35(50−35)2
χ2=65225+35225+65225+35225
χ2=3.46+6.43+3.46+6.43=19.78
Step 3 – Degrees of Freedom (df):
df = (2 -1)(2 -1) =1
Critical chi-square value at 1% significance level and df =1 is 6.635.
Since 19.78 > 6.635, we reject H0.
Conclusion: Study method is significantly associated with exam success.
Example 5: A geneticist expects that a particular plant produces flowers in the ratio of Red:White:Pink as 9:3:4. In an experiment, out of 160 plants, the observed counts were Red – 90, White – 30, Pink – 40. Test the hypothesis at 5% significance level.
Solution:
Step 1 – Expected Frequencies:
Total parts = 9 + 3 + 4 = 16
Expected Red:
ERed=169×160=90
Expected White:
EWhite=163×160=30
Expected Pink:
EPink=164×160=40
Step 2 – Compute /chi2 :
χ2=90(90−90)2+30(30−30)2+40(40−40)2=0
Step 3 – Degrees of Freedom (df):
df = 3 -1 = 2
Critical value at 5% significance level and df = 2 is 5.991.
Since χ2=0<5.991 we accept H0.
Conclusion: The observed data perfectly matches the expected ratio.
9.0Practice Questions on Chi-Square Test
- In a dice-rolling experiment of 120 rolls, the observed frequencies of outcomes 1, 2, 3, 4, 5, and 6 are: 15, 20, 18, 22, 25, and 20 respectively. Test at 5% significance level whether the dice is fair.
- A survey of 150 people records the choice of car brand (Brand A or Brand B) and gender (Male or Female):
Test whether car brand preference is independent of gender at 5% significance level.
Also Read: