Relationship of Bias Variance with Underfitting and Overfitting
In this article let us
understand bias and variance and how are they related to concept of
Underfitting and Overfitting. Also we will understand about a generalized model
suited for any ML Project.
The term error rate is the
average error of the value protected by the model and the corrected value.
What is bias?
Let's assume that we have
trained model and train data. So we are trying so we are trying to predict
values with input as X_train and the predicted value are y_predicted. So here
bias is the error rate of the y predicted and y_train values.
In simple words bias can be
term as error rate in training data.
We call it high bias when it
error rate is high and we call it is low bias when the error rate is low.
What is Variance?
Now let's assume that we
have trained to model and this time we are trying to predict values. We are
using test data with the input x_test again the predicted values are
y_predicted and y_test.
In simple words, variance
can be said as rate of error in test data.
We call it high variance when error rate is high and call it as low variance when error rate is low.
What is underfitting?
When there is high error in
training data that means when bias is high also when there is high error in
test data that means variance is high. This situation is called as
underfitting.
In simple words, high
variance and high bias implies underfitting.
Let's take an example of students who scored less in the mock test as they were not prepared. Even when the actual test was taken the score of students were still low due to tough paper. Such situation can be said as underfitting.
What is overfitting?
When there is low error in
training data that means when bias is low. And when there is high error in test
data that means when variance is high. This situation is called as overfitting.
In simple words, low bias
and high variance implies overfitting.
Let's take an example that Indian team has went for world-cup and during practice match they played well and good score was done. But during the actual match they played bad and score was very less. Such situation can be said as overfitting.
Generalized or ideal model
When the error rate in training data and test data both are low that means we have low bias and low variance, then we will call the model as the generalized or ideal model.
Bias variance trade-off
When we plot the graph of error rate and model complexity we get a graph shown below.
The sweet spot shown in the
graph is called as perfect tradeoff or bias variance tradeoff. This is also a
generalized form where the bias and variance both are low altogether.
- Jay Charole
- Mar, 11 2022