Data Science

Evaluation of Classification Model

For a Regression problem we have different metrics to measure performance. But how to measure the performance of classification model. Here comes the solution , we use the confusion matrix method to measure the credibility of our model. The different metrics used for this purpose are:

Accuracy
Precision
Recall
F1 Score
Specifity
AUC(Area Under Curve)
RUC(Receiver Operator Characteristic)

True Positive(TP): A result that was predicted as positive by the classification model and also is positive

True Negative(TN): A result that was predicted as negative by the classification model and also is negative

False Positive(FP): A result that was predicted as positive by the classification model but actually is negative

False Negative(FN): A result that was predicted as negative by the classification model but actually is positive.

Accuracy- Total number of correct classification divided by total number of classifications.

Recall/Sensitivity- It measures the total number of positive results how many positive were correctly predicted by the model.

It shows how relevant the model is, in terms of positive results only.

Let’s suppose in the previous model, the model gave 50 correct predictions(TP) but failed to identify 200 cancer patients(FN). Recall in that case will be:

Recall= $\frac{50}{(50 + 200)}$ = 0.2 (The model was able to recall only 20% of the cancer patients)

Precision- It is a measure of amongst all positive predictions , how many of them were actually positive.

Let’s suppose in the previous example, the model identified 50 people as cancer patients(TP) but also raised a false alarm for 100 patients(FP). Hence,

Precision= $\frac{50}{(50 + 100)}$ =0.33 (The model only has a precision of 33%)

F1 Score- It is defined as the harmonic mean of Precision and Recall.

Specificity or True Negative Rate- This represent how specific is the model while predicting the True Negatives. It is the total number of actual negative or non favorable outcomes. Mathematically,

Similarly, False Positive rate can be defined as: (1- specificity) Or, $\frac{F P}{(T N + F P)}$

ROC - We know that the classification algorithms work on the concept of probability of occurrence of the possible outcomes. A probability value lies between 0 and 1. Zero means that there is no probability of occurrence and one means that the occurrence is certain.

But while working with real-time data, it has been observed that we seldom get a perfect 0 or 1 value. Instead of that, we get different decimal values lying between 0 and 1. Now the question is if we are not getting binary probability values how are we actually determining the class in our classification problem?

There comes the concept of Threshold. A threshold is set, any probability value below the threshold is a negative outcome, and anything more than the threshold is a favorable or the positive outcome.

For Example, if the threshold is 0.5, any probability value below 0.5 means a negative or an unfavorable outcome and any value above 0.5 indicates a positive or favorable outcome.

The following diagram shows a typical logistic regression curve.

The horizontal lines represent the various values of thresholds ranging from 0 to 1.
Let’s suppose our classification problem was to identify the obese people from the given data.
The green markers represent obese people and the red markers represent the non-obese people.
Our confusion matrix will depend on the value of the threshold chosen by us.

AUC - Let’s suppose that we used different classification algorithms, and different ROCs for the corresponding algorithms have been plotted. The question is: which algorithm to choose now? The answer is to calculate the area under each ROC curve.

It helps us to choose the best model amongst the models for which we have plotted the ROC curves.

Ayush Jain
Apr, 01 2022