Classification vs Regression
Data Science

Classification vs Regression

In machine learning, there are 2 ways study data and they are:

  • Unsupervised learning: the model learns from unlabeled data
  • Supervised learning: the model learns from labeled data

This blog is divided into 3 parts; they are:

  • What is classification in machine learning?
  • What is regression?
  • Classification vs Regression

What is classification in machine learning?

Classification is a type of supervised learning, its the task of approximation of a mapping function from an input variable (x) to a discrete output variable (y) called a label or a class.

The mapping function predicts the class or label for a given query. for example, a customer can give rating to an item they bought and your models function maps this review to a class of either "positive" or "negative".

Some classification terminologies:

Classifier: An algorithm that is used to assign a label to the input data.

Multi-class classification: There are more than 2 classes but each query is assigned to only one class.

Multi-label classification: Each query can be assigned to more than one class.

Evaluate: Showing the models accuracy.

Examples of classification algorithms:

  • Naive bayes classifier
  • K-nearest neighbor
  • Decision tree
  • Random forest

What is regression in machine learning?

Regression analysis is a crucial concept in machine learning, it is a type of supervised learning.

It's the task of approximation of a mapping function from input (x) to continuous output (y) , it is used to understand how a dependent variable changes with respect to an independent variable when other independent variables are fixed.

In regression we plot a graph that best-fits the given data points and the model makes predictions based on this graph


Some regression terminologies:

Dependent variable: Main factor that we want to predict.

Predictor: The factors used to predict the dependent variables.

Overfitting: The model fits exactly on the training data, resulting in inaccurate results with unseen data.

Underfitting: The model is too simple or it needs more training time, underfitting is easier to detect because its behavior can be seen using the training data.

Outliers: An observation with very high or very low value compared to other observations, they should be avoided because they can alter with the result.

Examples of regression algorithms:

  • Linear regression
  • Random forest regression
  • Support vector regression
  • Logistic regression

Linear regression code snippet:

from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target)
clf = LinearRegression()
clf.fit(X_train, y_train)
result=clf.predict(X_test)
expected=y_test
plt.figure(figsize=(4, 3))
plt.scatter(expected, result)
plt.plot([0, 50], [0, 50], '--k')
plt.axis('tight')
plt.xlabel('True price ($1000s)')
plt.ylabel('Predicted price ($1000s)')
plt.tight_layout()


output:



Generally, regression algorithms are used to predict continuous values like age, price, temperature, etc.

Classification vs Regression 



RegressionClassification
Data type

predicted data is of continuous value

ex: salary, price, age

predicted data is of discrete value

ex: either "good" or "bad"

Calculation methoduses root mean square error (RMSE)
calculate accuracy to know the best fit for the dataset
Nature of prediction outputpredicted data is ordered
predicted data is unordered
Algorithms
  • random forest regression
  • linear regression
  • support vector regression
  • Naive bayes classifier
  • K-nearest neighbor
  • Decision tree

Conclusion

These are some of the differences between regression and classification, in some cases an overlap can happen between the 2, for example:

  • Classification algorithm may predict a continuous value that is in the form of a probability of a certain label.
  • Regression algorithm can predict discrete value in the form of integer quantity.

because of this overlap, in some cases its possible to convert a regression problem into a classification problem and vice versa

Summary

In this short blog you learned:

  • Predictive modeling is learning a mapping function from inputs to outputs
  • Regression is prediction of continuous output
  • Classification is prediction of discrete label output







  • Khaled Gamal
  • Mar, 20 2022

Add New Comments

Please login in order to make a comment.

Recent Comments

Be the first to start engaging with the bis blog.