EVALUATION METRICS FOR REGRESSION

Hello guys, in my previous blog that is Mathematical implementation of Linear Regression I explained the internal operations of the algorithm, in this blog I will explain you the evaluation metrics that are used to evaluate the model.

Before going to the Evaluation metrics let me show you a graph.

(x,y) is an Actual value.
(x,ŷ) is an predicted value.
y-ŷ= Difference between the actual and predicted value /also called as Residual.

EVALUATION METRICS

MEAN SQUARED ERROR(MSE)
MEAN ABSOLUTE ERROR(MAE)
R-SQUARED
ROOT MEAN SQUARE(RMSE)

MEAN SQUARED ERROR (MSE)

MSE is nothing but the squared difference between the actual and the predicted value.
Let me take an example

Here y is the actual value and yhat is the predicted value.
Let's begin the process:
Calculate the difference between actual(Y) and predicted(YHAT).

Here Error is the difference between Y and Yhat
Error=Y - Yhat
So, our next step is to square this Error

We have got the squares of the error now, now we have to calculate ∑ of squared error which means adding all the elements in the Squared error column

Adding up all the elements in the squared error we get 3.6.
The total no element present in our dataset is 5 or just calculate the no of rows.

Now let's calculate the Mean squared error
Formula to caluclate Mean squared error is,

No of data points(n) =5
The sum of Squared error /= 3.6
we have got the value now let; caluclate

MSE=3.6/5 =0.72
MSE=0.72

Smaller the Mean Squared error(MSE) closer we are to the best-fit line/Regression line.

This indicates the model is performing well and the distance between the actual and predicted value is less.

MEAN ABSOLUTE ERROR

Mean absolute error is another loss function used for Regression models.
MAE is the sum of absolute differences between our actual(y) and predicted(yhat) values.

The absolute difference between the two numbers 4 and 2 is |4-2|=2
The absolute difference between the two numbers -4 and -2 is |-4-2|=|-6|=6

It measures the average magnitude/(size) of errors in a set of predictions.
So, let's apply this to our dataset now

As I have mentioned above y is the actual value and yhat is the predicted value.
In MAE we will take the absolute difference, though the difference might be negative, the absolute difference will be positive because the modulus of the no is always positive.
Now let's calculate ∑ Squared errors

Absolute error / ∑ =3.2
Total no data points= 5
Formula to find Mean Absolute error is,

Let's apply the calculated values on the formula,

MAE=3.2 / 5 = 0.64
MAE= 0.64

ROOT MEAN SQUARED ERROR(RMSE)

RMSE is the standard deviation of the residuals/loss/error.
RMSE is a measure of how spread out the residuals are.
RMSE is calculated by taking the square root of MSE.
Formula to caluclate RMSE is,

Formula inside the square root is the formula of MSE.
Taking the square root of it √MSE, we will get RMSE
MSE=0.72(from the above calculation)
√0.72 = 0.848
RMSE=0.848
Lower values of RMSE indicates better fit

R-SQUARED

R-squared is a statistical measure of how close the data are to the fitted regression line.
R-squared is always between 0 - 100%
R-squared of 0% reveals us that 0% of data fits the regression model.
R-squared of 100% reveals us that 100% of data fits the regression model.
In general, the higher the R-squared, the better the model fits your data
Formula to calculate R_squared are:

Here,

SSRES = Sum of square of residuals
SSTOT=Sum of squares of total
SSRES we have calculated while calculating the MSE, you can go above and refer.
SSTOT is nothing but square of Actual value - Average of Actual value.

∑ Squared error / = 3.6
=5.2
so let's implemented the above calculated values in the formula

1- 3.6 / 5.2 = 0.3076
R_squared = 0.3076 0.3076*100= 30.76%

In most of the cases R-squared threshold will be 50% .
If the calculated R-squared value is less than the threshold(50%) it is not considered as good fit.
If the calculated R-squared value is more than the threshold(50%) it is considered as good fit.
our model indicates only 30.076% of data fit the regression model.
so, our model is not a good fit.
So, the R-squared value tells us how well the model fits our data.
Higher the R-squared score, better the model fits your's data.