Data Science

Classification vs Regression

In machine learning, there are 2 ways study data and they are:

Unsupervised learning: the model learns from unlabeled data
Supervised learning: the model learns from labeled data

This blog is divided into 3 parts; they are:

What is classification in machine learning?
What is regression?
Classification vs Regression

What is classification in machine learning?

Classification is a type of supervised learning, its the task of approximation of a mapping function from an input variable (x) to a discrete output variable (y) called a label or a class.

The mapping function predicts the class or label for a given query. for example, a customer can give rating to an item they bought and your models function maps this review to a class of either "positive" or "negative".

Some classification terminologies:

Classifier: An algorithm that is used to assign a label to the input data.

Multi-class classification: There are more than 2 classes but each query is assigned to only one class.

Multi-label classification: Each query can be assigned to more than one class.

Evaluate: Showing the models accuracy.

Examples of classification algorithms:

Naive bayes classifier
K-nearest neighbor
Decision tree
Random forest

What is regression in machine learning?

Regression analysis is a crucial concept in machine learning, it is a type of supervised learning.

It's the task of approximation of a mapping function from input (x) to continuous output (y) , it is used to understand how a dependent variable changes with respect to an independent variable when other independent variables are fixed.

In regression we plot a graph that best-fits the given data points and the model makes predictions based on this graph

Some regression terminologies:

Dependent variable: Main factor that we want to predict.

Predictor: The factors used to predict the dependent variables.

Overfitting: The model fits exactly on the training data, resulting in inaccurate results with unseen data.

Underfitting: The model is too simple or it needs more training time, underfitting is easier to detect because its behavior can be seen using the training data.

Outliers: An observation with very high or very low value compared to other observations, they should be avoided because they can alter with the result.

Examples of regression algorithms:

Linear regression
Random forest regression
Support vector regression
Logistic regression

Linear regression code snippet:

from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target)
clf = LinearRegression()
clf.fit(X_train, y_train)
result=clf.predict(X_test)
expected=y_test
plt.figure(figsize=(4, 3))
plt.scatter(expected, result)
plt.plot([0, 50], [0, 50], '--k')
plt.axis('tight')
plt.xlabel('True price ($1000s)')
plt.ylabel('Predicted price ($1000s)')
plt.tight_layout()

output:

Generally, regression algorithms are used to predict continuous values like age, price, temperature, etc.

Classification vs Regression

	Regression	Classification
Data type	predicted data is of continuous value ex: salary, price, age	predicted data is of discrete value ex: either "good" or "bad"
Calculation method	uses root mean square error (RMSE)	calculate accuracy to know the best fit for the dataset
Nature of prediction output	predicted data is ordered	predicted data is unordered
Algorithms	random forest regression linear regression support vector regression	Naive bayes classifier K-nearest neighbor Decision tree

Conclusion

These are some of the differences between regression and classification, in some cases an overlap can happen between the 2, for example:

Classification algorithm may predict a continuous value that is in the form of a probability of a certain label.
Regression algorithm can predict discrete value in the form of integer quantity.

because of this overlap, in some cases its possible to convert a regression problem into a classification problem and vice versa

Summary

In this short blog you learned:

Predictive modeling is learning a mapping function from inputs to outputs
Regression is prediction of continuous output
Classification is prediction of discrete label output

Khaled Gamal
Mar, 20 2022

Add New Comments

Please login in order to make a comment.

Classification vs Regression

What is classification in machine learning?

Some classification terminologies:

Examples of classification algorithms:

What is regression in machine learning?

Some regression terminologies:

Examples of regression algorithms:

Linear regression code snippet:

Classification vs Regression

Conclusion

Summary

Add New Comments

Recent Comments

Terms

About

Important Links