Linear Regression
Linear Regression is a fundamental concept in statistics and machine learning, widely used for modeling relationships between variables. Whether you're analyzing trends, making predictions, or understanding data patterns, linear regression provides a straightforward approach to uncovering insights.
1.0What Is Linear Regression?
Linear Regression is a statistical method used to model the relationship between a dependent variable (target) and one or more independent variables (predictors). The goal is to fit a linear equation to observed data, allowing for predictions and trend analysis.
2.0Basic Linear Regression
In its simplest form, Simple Linear Regression involves two variables:
- Dependent Variable (Y): The outcome you're trying to predict.
- Independent Variable (X): The predictor or feature used to make predictions.
The relationship is modeled by the equation: Y=b0+b1X+ϵ
Where:
- b0 is the y-intercept,
- b1 is the slope of the line,
- ϵ is the error term.
This method assumes a straight-line relationship between the variables and is typically used when there's a clear, linear association between them.
3.0Linear Regression Analysis
Linear Regression Analysis involves several key steps:
- Data Collection: Gather data for the dependent and independent variables.
- Model Fitting: Use statistical methods, such as the least squares method, to estimate the coefficients (b0 and b1).
- Model Evaluation: Assess the model's performance using metrics like R-squared, which indicates the proportion of variance in the dependent variable explained by the independent variable.
- Prediction: Use the fitted model to predict new values of the dependent variable based on new data for the independent variable.
This process helps in understanding the strength and nature of the relationship between variables and in making informed predictions.
4.0Multiple Linear Regression
Multiple Linear Regression (MLR) extends simple linear regression by modeling the relationship between a dependent variable and two or more independent variables. The equation for MLR is:
Y=b0+b1X1+b2X2+⋯+bnXn+ϵ
Where:
- X1,X2,X3,....,Xn are the independent variables,
- b1,b2,.....,b+n are their corresponding coefficients.
MLR allows for a more nuanced understanding of how multiple factors simultaneously influence the dependent variable. For example, predicting house prices might involve variables like square footage, number of bedrooms, and location.
5.0Solved Example on Linear Regression
Example 1: A company wants to predict its sales based on the amount spent on advertising. The data collected for 5 months is as follows:
The task is to find the Linear Regression Equation that models the relationship between Advertising Spend and Sales, and use it to predict sales if the advertising spend is 6.0 thousand.
Solution:
Step 1: Calculate the Mean of X and Y
- Mean of X (Advertising Spend):Xˉ=51.0+2.0+3.0+4.0+5.0=3.0
- Mean of Y (Sales):Yˉ=52.0+2.5+3.5+4.0+5.0=3.4
Step 2: Calculate the Slope (m) using the formula
The formula to calculate the slope m is:
m=∑(Xi−Xˉ)2∑(Xi−Xˉ)(Yi−Yˉ)
Now, let’s compute the required values:
- Sum of (Xi−Xˉ)(Yi−Yˉ): 2.8 + 0.9 + 0.0 + 0.6 + 3.2 = 7.5
- Sum of (Xi−Xˉ)2: 4.0 + 1.0 + 0.0 + 1.0 + 4.0 = 10.0
Now, calculate the slope m: m=10.07.5=0.75
Step 3: Calculate the Y-Intercept (c)
The formula for the y-intercept c is:
c=Yˉ−mXˉ
Substitute the known values:
c=3.4−(0.75)(3.0)=3.4−2.25=1.15
Step 4: Write the Linear Regression Equation
The linear regression equation is: y=0.75x+1.15
Where:
- y is the predicted sales (dependent variable).
- x is the advertising spend (independent variable).
Step 5: Predict Sales for an Advertising Spend of 6.0 Thousand
To predict the sales when the advertising spend is 6.0 thousand, substitute x = 6.0 into the regression equation:
y=0.75(6.0)+1.15=4.5+1.15=5.65
Thus, the predicted sales for an advertising spend of 6.0 thousand is 5.65 thousand.
Final Answer:
- Linear Regression Equation: y = 0.75x + 1.15
- Predicted Sales for Advertising Spend of 6.0 Thousand: 5.65 thousand
Example 2: The following data shows the number of hours studied and the corresponding marks obtained by a student:
Find the linear regression equation and predict the marks when a student studies for 6 hours.
Solution:
- Find the mean of X and Y:
- Xˉ=51+2+3+4+5=3
- Yˉ=520+35+50+65+80=50
- Calculate the slope (m) using the formula:
m=∑(Xi−Xˉ)2∑(Xi−Xˉ)(Yi−Yˉ)
Compute the values:
- Sum of (Xi−Xˉ)(Yi−Yˉ)= 150
- Sum of (Xi−Xˉ)2 = 10
Therefore, m=10150=15
- Calculate the intercept (c):
c=Yˉ−mXˉ
=50−15×3=5
- Regression equation: y=15x+5
- Predict marks for 6 hours of study: y=15×6+5=90+5=95
Answer: The predicted marks for 6 hours of study is 95.
Example 3: The following data shows the amount spent on advertising and the corresponding sales:
Find the linear regression equation and predict sales for an advertising spend of 6.
Solution:
- Calculate the means:
- Xˉ=51+2+3+4+5=3
- Yˉ=52+4+5+6+8=5
- Calculate the slope (m): m=∑(Xi−Xˉ)2∑(Xi−Xˉ)(Yi−Yˉ)
- Sum of (Xi−Xˉ)(Yi−Yˉ)=14
- Sum of (Xi−Xˉ)2=10
Therefore, m=1014=1.4
- Intercept (c): c=Yˉ−mXˉ =5−1.4×3=1.8
- Regression equation: y=1.4x+1.8
- Predict sales for 6 units of advertising spend: y=1.4×6+1.8=8.4+1.8=10.2
Answer: Predicted sales for an advertising spend of 6 is 10.2.
Example 4: The following data shows the number of hours a person watches TV and their productivity at work:
Find the linear regression equation and predict productivity when a person watches 6 hours of TV.
Solution:
- Calculate the means:
- Xˉ=51+2+3+4+5=3
- Yˉ=590+85+80+70+50=75
- Calculate the slope (m): m=∑(Xi−Xˉ)2∑(Xi−Xˉ)(Yi−Yˉ)
- Sum of (Xi−Xˉ)(Yi−Yˉ)=−95
- Sum of (Xi−Xˉ)2=10
Therefore: m=10−95=−9.5
- Intercept (c):
c=Yˉ−mXˉ
=75−(−9.5×3)=75+28.5=103.5
- Regression equation: y=–9.5x+103.5
- Predict productivity for 6 hours of TV: y=−9.5×6+103.5=−57+103.5=46.5
Answer: Predicted productivity for 6 hours of TV is 46.5.
6.0Practice Questions on Linear Regression
Question 1: The data below shows the number of hours studied and the marks obtained by a student: Find the Linear Regression Equation (i.e., y = mx + c) and use it to predict the marks if a student studies for 6 hours.
Question 2: A company collects the following data on the amount spent on advertising and the sales generated. Find the Linear Regression Equation for predicting sales based on advertising spend, and predict the sales if the advertising spend is 6.
Question 3: The following data shows the number of years of experience and the corresponding salaries. Find the Linear Regression Equation to predict salary based on years of experience. Then, predict the salary for someone with 6 years of experience.
Question 4: The table below shows the number of books sold by a bookstore in a week and the number of advertisements they ran. Find the Linear Regression Equation for predicting the number of books sold based on the number of advertisements run, and predict the number of books sold if 3 ads are run.
Question 5: The data below shows the number of products produced by a factory and the number of workers employed. Find the Linear Regression Equation for predicting the number of products produced based on the number of workers employed. Predict the number of products produced when 25 workers are employed.
7.0Sample Question on Linear Regressions
1. How do you calculate the slope and intercept?
Ans: Use the formulas:
- m=∑(Xi−Xˉ)2∑(Xi−Xˉ)(Yi−Yˉ)
- c=Yˉ−mXˉ