Data Science

Relationship of Bias Variance with Underfitting and Overfitting

In this article let us understand bias and variance and how are they related to concept of Underfitting and Overfitting. Also we will understand about a generalized model suited for any ML Project.

The term error rate is the average error of the value protected by the model and the corrected value.

What is bias?

Let's assume that we have trained model and train data. So we are trying so we are trying to predict values with input as X_train and the predicted value are y_predicted. So here bias is the error rate of the y predicted and y_train values.

In simple words bias can be term as error rate in training data.

We call it high bias when it error rate is high and we call it is low bias when the error rate is low.

What is Variance?

Now let's assume that we have trained to model and this time we are trying to predict values. We are using test data with the input x_test again the predicted values are y_predicted and y_test.

In simple words, variance can be said as rate of error in test data.

We call it high variance when error rate is high and call it as low variance when error rate is low.

What is underfitting?

When there is high error in training data that means when bias is high also when there is high error in test data that means variance is high. This situation is called as underfitting.

In simple words, high variance and high bias implies underfitting.

Let's take an example of students who scored less in the mock test as they were not prepared. Even when the actual test was taken the score of students were still low due to tough paper. Such situation can be said as underfitting.

What is overfitting?

When there is low error in training data that means when bias is low. And when there is high error in test data that means when variance is high. This situation is called as overfitting.

In simple words, low bias and high variance implies overfitting.

Let's take an example that Indian team has went for world-cup and during practice match they played well and good score was done. But during the actual match they played bad and score was very less. Such situation can be said as overfitting.

Generalized or ideal model

When the error rate in training data and test data both are low that means we have low bias and low variance, then we will call the model as the generalized or ideal model.

Bias variance trade-off

When we plot the graph of error rate and model complexity we get a graph shown below.

The sweet spot shown in the graph is called as perfect tradeoff or bias variance tradeoff. This is also a generalized form where the bias and variance both are low altogether.

Jay Charole
Mar, 11 2022