Data Science

Ensemble Method in Machine Learning - An Overview and Facts

Ensemble method is a machine learning technique that essentially combines prediction from a number of machine learning models to give a more accurate final prediction. or we can say it is a machine learning method for combining prediction from multiple separate models to get the most accurate prediction.

The concept and motivation behind the ensemble method are based on the belief that a committee of experts working together can perform better than a single expert.

Sometimes ensemble method is referred to as the “community method” as the core concept of ensemble technique is based on the idea that a community of specific team is a better option than individual experts

Facts 1: Usually all the machine learning models which are created under ensemble method belongs to a family of machine learning model, either they fall in the category of Decision Tree or Regression, but they don’t have to be, the main idea is to pick a whole bunch of independent models and somehow combine them to get the most accurate and robust prediction.

How Ensemble Technique Works

* Create n different models using decision tree

Essentially we start with training data and create multiple models, all different from each other.

* Using Test data for Prediction

Pushing the test data into the inputs of each model and each of the n models will have n number of predictions and then combining all predictions to get the final prediction.

In a classification problem, the final prediction is done based on Voting, Let's say the problem we have to predict default and no default, then the combined voting would be based on counting the no of models predicting the defaults against the number of models predicting the no-defaults.

In a regression problem, where the dependent variable is continuous in nature, let's say predicting sales is based on a weighted average and mathematical equations.

Facts: with n different models, when we take the average, the variation in the model start decreasing, the larger the number of samples, the lower the number of variance in the prediction

How the Models are constructed in the Ensemble Method

There are two ways to Construct Models in Ensemble Method

1. Parallel (Bagging)

2. Sequentially (Boosting)

In a Parallel manner, the models are constructed together, whereas in a sequential manner the models are constructed one after another.

The sequential manner is a more time-consuming and costlier way of constructing the model as we have to wait for the next model, which is based on the prediction of the previous model.

Bagging Method

It is an ensemble machine learning model building technique based on the parallel construction of models. In the Bagging ensemble method, instead of building multiple models from a single dataset, models are built from subsets created from the original datasets.

The subsets are created based on random sampling with replacement techniques, where there is a possibility that some of the rows from the original datasets may get repeated multiple times in the subsets.

In the Bagging method, over-fitting happens for the sample data nor for the original datasets.

Let's say, from original datasets multiple subsets have been created, and from respective subsets models have been created, then the possibility of model overfitting will be because of the respective subsets, and when these models will be combined the chances of overfitting would decrease.

Boosting Method

Boosting is an ensemble technique that follows a sequential manner to construct a model, it is based on the concept that every model depends on the prediction of the previous one.

To understand it, Let say we have built a model 1 from datasets, so it is obvious that this model is not going to predict completely, it may predict correctly as well as wrong also, so in boosting method, we construct another data learning from the model 1, in which we take the initial data, but for every data points Model 1 predict incorrectly is given a larger weightage, that is the data instances which were not predicted correctly is overweight and another model, model 2 is built and model 2 would be different from model 1 because it is looking at the overweight data instances also and got the objective of making a prediction on these incorrectly predicted by model 1. And then model 3 is built is the same process, so it is an iterative process until we reached a threshold to accept the variation.

Types of Boosting Methods

1. Ad Boosting

*In Adaptive Boosting, successive learners are created with a focus on the ill-fitted data of the previous learner.

* Each successive learner focuses more and more on the harder to fit the data that is their residual in the previous model

2. Gradient Boosting

* Each learner is fit on a modified version of original data (original data is replaced with x value and residuals from the previous learner)

* by fitting new models to the residual, the overall learner gradually improves in areas where residuals are initially high

Facts: In boosting if a row gets predicted wrong, again and again, its weights start to increase

Kamal Prakash
Mar, 28 2022