LINEAR REGRESSION
LINEAR REGRESSION is a type of supervised machine learning which deals with continuous data.
LINEAR REGRESSION comes under regression in supervised machine learning.
(don't know what is continuous data? refer to my previous blog or click the link given below).
- From the above graph, we can see X is an independent variable and Y is a dependent variable, based on the features of X the Y gets plotted.
- Apart from X and Y we also have a Regression line, which is also known as the line of linear regression.
- This regression line shows the relationship between the Independent variable(x) and the dependent variable(Y).
- The formula to find the regression line is y=mx+c or yhat=β0+β1*X
- Here,
- β0=Intercept
- β1=Coeeficient
- X= x values
Formula to find out β1 is:
Formula to find β0 is:
So for clear understanding let's take a graph
- The orange dots are actual values.
- The green dots are predicted values.
- The blue line is a regression line that passes through predicted values.
- The Redline is residual.
- Residual is the distance between the actual value and predicted value
- So, the main goal of the Linear Regression algorithm is to reduce these residuals, or in simple terms reduce the distance between the actual and predicted value.
So, let's apply Linear Regression on the small dataset now
- Here Y is an actual value
- As we know Y is dependent on X.
Let me show you the graph
- The graph plotted above is based on the small datasets X and Y mentioned above.
- In order to find predicted values, we have to calculate β0,β1
Let's calculate β1
- The formula to calculate β1 is mentioned above.
- so in order to get the parameters to calculate the β1 value, we should do this
- As we know the formula to calculate β1 is
- From the above table calculate the value of β1
- Here,
- Σ(x-x̅)(y-ȳ)=4
- Σ(x-x̅)(x-x̅)=10
- Let's apply this to the formula,
- β1= 4/10 =0.4
- β1=0.4
- Now we have got the b1 value
- the formula to calculate β0 is
- we know the value of β1 now imply those in formula
- Here,
- X̅ = 3
- β1= 0.4
- ȳ = 3.6
- β0= 3.6 - 0.4 * 3 =2.4
- β0= 2.4
Regression line formula
- we have got β0and β1
- now put those values in yhat formula to get the predicted values and regression line
- yhat=β0+β1*x
- yhat=2.4+0.4*x
- so, the regression line equation of our graph is
- yhat=2.4+0.4*x
- so this our dataset which was taken
- now yhat=2.4+0.4*x
- now we have to put each of x values in the place of x
- yhat(x1)=2.4+0.4*1=2.8
- yhat(x2)=2.4+0.4*2=3.2
- yhat(x3)=2.4+0.4*3=3.6
- yhat(x4)=2.4+0.4*4=4.0
- yhat(x5)=2.4+0.4*5=4.4
- so these are the predicted values or yhat values let's add those values to our table
- Here y is the actual value and yhat is the predicted value.
- Here y(blue line) is the actual value and yhat(orange) is the predicted value
- The orange line is known as the Regression line
- The distance between the points of the actual and predicted value is known as residual.
- Our main goal is to reduce the length of the residual or we also say it as an error.
- Error and residuals both are the same.
- The line with the least error will be the line of Linear regression or regression line or Best-fit line
- In order to obtain the line with the least error, the system will do N no of iterations with the different values of β1
- Here β1=0.4 the system might do β1=0.2,0.3,0.7,0.6 .... goes on.
- if β1 changes subsequently β0 and yhat values will also change.

Nice blog😃
ReplyDeleteThanks bro
DeleteVery well put😀
ReplyDeleteThank you
DeleteThambi 👌
ReplyDeleteThanks bro
DeleteAccurate,perfect and best explaination keep going 🤟🏻
ReplyDeleteThank you
DeleteVery Well Explained and easy to understand .....Looking forward for upcoming blogs .
ReplyDeleteThanks