Skip to main content

EVALUATION METRICS FOR REGRESSION

Hello guys, in my previous blog that is Mathematical implementation of Linear Regression I explained the internal operations of the algorithm, in this blog I will explain you the evaluation metrics that are used to evaluate the model.

Before going to the Evaluation metrics let me show you a graph.    

  • (x,y) is an Actual value.
  • (x,ŷ) is an predicted value.
  • y-ŷ= Difference between the actual and predicted value /also called as Residual.

EVALUATION METRICS

  • MEAN SQUARED ERROR(MSE)
  • MEAN ABSOLUTE ERROR(MAE)
  • R-SQUARED
  • ROOT MEAN SQUARE(RMSE)

MEAN SQUARED ERROR (MSE)

  • MSE is nothing but the squared difference between the actual and the predicted value.
  • Let me take an example


  • Here y is the actual value and yhat is the predicted value.
  • Let's begin the process:
  • Calculate the difference between actual(Y) and predicted(YHAT).


  • Here Error is the difference between Y and Yhat
  • Error=Y - Yhat
  • So, our next step is to square this Error


  • We have got the squares of the error now, now we have to calculate ∑ of squared error which means adding all the elements in the Squared error column
  • Adding up all the elements in the squared error we get 3.6.
  • The total no element present in our dataset is  5 or just calculate the no of rows.


  • Now let's calculate the Mean squared error
  • Formula to caluclate Mean squared error is,


  • No of data points(n) =5
  • The sum of Squared error /= 3.6
  • we have got the value now let; caluclate
    • MSE=3.6/5  =0.72
    • MSE=0.72
  • Smaller the Mean Squared error(MSE) closer we are to the best-fit line/Regression line.
    • This indicates the model is performing well  and the distance between the actual and predicted value is less.

MEAN ABSOLUTE ERROR

  • Mean absolute error is another loss function used for Regression models.
  • MAE is the sum of absolute differences between our actual(y) and predicted(yhat) values.
    • The absolute difference between the two numbers 4 and 2 is |4-2|=2
    • The absolute difference between the two numbers -4 and -2 is |-4-2|=|-6|=6
  • It measures the average magnitude/(size) of errors in a set of predictions.
  • So, let's apply this to our dataset now

  • As I have mentioned above y is the actual value and yhat is the predicted value.
  • In MAE we will take the absolute difference, though the difference might be negative, the absolute difference will be positive because the modulus of the no is always positive.
  • Now let's calculate ∑ Squared errors

  • Absolute error /   =3.2
  • Total no data points= 5
  • Formula to find Mean Absolute error is,


  • Let's apply the calculated values on the formula,
    • MAE=3.2 / 5 = 0.64
    • MAE= 0.64

ROOT MEAN SQUARED ERROR(RMSE)

  • RMSE is the standard deviation of  the residuals/loss/error.
  • RMSE is a measure of how spread out the residuals are.
  • RMSE is calculated by taking the square root of MSE.
  • Formula to caluclate RMSE is,

  • Formula inside the square root is the formula of MSE.
  • Taking the square root of it √MSE, we will get RMSE
  • MSE=0.72(from the above calculation)
  • √0.72 = 0.848
  • RMSE=0.848
  • Lower values of RMSE indicates better fit

R-SQUARED


  • R-squared is a statistical measure of how close the data are to the fitted regression line.
  • R-squared is always between 0 - 100%
  • R-squared of 0% reveals us that 0% of data fits the regression model.
  • R-squared of 100% reveals us that 100% of data fits the regression model.
  • In general, the higher the R-squared, the better the model fits your data
  • Formula to calculate R_squared are:


  • Here,
    • SSRES = Sum of square of residuals
    • SSTOT=Sum of squares of total
    • SSRES we have calculated while calculating the MSE, you can go above and refer.
    • SSTOT is nothing but square of Actual value - Average of Actual value.


  • ∑ Squared error /     = 3.6
  • =5.2


  • so let's implemented the above calculated values in the formula
    • 1- 3.6 / 5.2 = 0.3076
    • R_squared = 0.3076  0.3076*100= 30.76%
  • In most of the cases R-squared threshold will be 50% .
  • If the calculated R-squared value is less than the threshold(50%) it is not considered as good fit.
  • If the calculated R-squared value is more than the threshold(50%) it is considered as good fit.
  • our model indicates only 30.076% of data fit the regression model.
  • so, our model is not a good fit.
  • So, the R-squared value tells us how well the model fits our data.
  • Higher the R-squared score, better the model fits your's data.

 THANK YOU



Comments

Post a Comment

Popular posts from this blog

IMPLEMENTATION OF LINEAR REGRESSION IN PYTHON

Hello everyone, in this blog I will explain how to implement Linear Regression using python. This requires a pre-requisite which is Mathematical Implementation Of Linear Regression, which I have explained in one of my previous blogs, I have pasted the link below. Click here to know the mathematical implementation of Linear Regression Get to know the Mathematical implementation first, or else you might feel very difficult to understand this blog. So let's start, now I will show how to implement Linear Regression in python. NOTE:  Lines highlighted with Yellow color are actually code.    STEP 1: Import necessary libraries that are required for numerical operations, importing data frames and plotting the graph. import numpy as np import pandas as pd import matplotlib.pyplot as plt STEP 2: Read the data The data will be in the format of CSV, EXCEL there are still some, but I have mentioned only two . The data which I'm taking is tiny it has only two columns SAT and GPA , ...

LINEAR REGRESSION - MATHEMATICAL IMPLEMENTATION (SUPERVIZED MACHINE LEARNING)

 LINEAR REGRESSION LINEAR REGRESSION is a type of supervised machine learning which deals with continuous data. LINEAR REGRESSION comes under regression in supervised machine learning.   (don't know what is continuous data? refer to my previous blog or click the link given below). Click here to know what is continuous data From the above graph, we can see X is an independent variable and Y is a dependent variable, based on the features of X the Y gets plotted. Apart from X and Y we also have a Regression line, which is also known as the line of linear regression. This regression line shows the relationship between the Independent variable(x) and the dependent variable(Y). The formula to find the regression line is  y=mx+c   or    yhat=β0+β1*X Here, β0 =Intercept β1 =Coeeficient X= x values Formula to find out  β1 is: Formula to find  β0 is:   So for clear understanding let's take a graph Here, The orange dots are actual values. The gre...

INTRODUCTION TO MACHINE LEARNING

  MACHINE LEARNING Machine learning is nothing but training the machine with data in order to predict the correct output and to give us better accuracy. TYPES OF MACHINE LEARNING SUPERVISED MACHINE LEARNING UNSUPERVISED MACHINE LEARNING REINFORCEMENT LEARNING   SUPERVISED MACHINE LEARNING In supervised machine learning, we have both X and Y labels. Here, X is an independent variable and Y is a dependent variable. Based on the features of X we are going to predict the output Y. Here X features are( Outlook, Temp, Humidity, Windy ) based on these factors we predict the output or we also call it as target variable Y which is ( Play ).   X is independent.   Y is dependent on X that means Y which is ( Play) can only be predicted based on the X features which are ( Outlook,Temp, Humidity, Windy ). Considering X features we can say whether we can Play on those conditions or not. TYPES OF SUPERVISED MACHINE LEARNING Supervised machine learning is further divided into: CLAS...