Data Science

Back Propagation explained in easy words !!

What is Back Propagation in Machine Learning?

Back propagation is an algorithm used in supervised learning of Neural Networks that is used by Stochastic Gradient Descent to calculate the gradient of the loss function and therefore decrease the loss during the training of a neural network.

What is Stochastic Gradient Descent?

Before getting into the details of how back propagation works let us talk first about Stochastic Gradient Descent or SGD for short. SGD (most commonly used) is what we call an optimization algorithm, it does exactly as the name suggests it optimizes the weights of the neural networks for every epoch to minimize the loss function during training of the model.

Some ML Terminologies:

Model: An algorithm that is trained to recognize specific patterns whether it be images, voice, text, etc.

Epoch: Iteration or repetition of times that the data set passes through the model during training.

Input Layer: the input layer passes data to the first hidden layer where the data is passed through an activation function then multiplies the data with the weights of the first hidden layer.

Hidden Layers: they are the layers in between the input and output layers in which the function applies weights to the data and passes them to the activation function.

Output Layers: it is the last layer in a neural net where the output of the model is produced.

How does Back Propagation work?

Returning to back propagation, here we have a random neural net with two hidden layers, the data is first fed forward from the input layer it propagates(each layer receives input from the previous layer) forward through the network until it reaches the output layer. Then the loss is calculated for the result given depending on the loss function. To optimize the loss function the derivative of the loss function is calculated with respect to the weights in the model, to calculate the gradient the SGD algorithm uses back propagation. After forward propagation as the name implies back propagation passes through the network backward updating the weights of each layer to minimize the loss function. The Stochastic Gradient Descent algorithm starts by looking at the output given from the output layer and then decides based on the calculation which values should increase and which weights should decrease to minimize the error. Since the values of the output are calculated by the sum of weights of the layer being multiplied with the output from the previous layer and also the activation output cannot be changed directly(because it is a calculation based on the weights and the output of the previous layers) one way to indirectly update the values for the output layer is to change the activation output from the preceding layer or by updating the weights that are connected to the output layer and jumping backward until the input layer is reached updating each layers weights for each epoch lowering the loss until the training is complete.

Summary

To recap, when training an artificial neural network, data is passed into the model. The data goes through the model through a process called forward propagation where the weighted sum is repetitively calculated from the output previous layers activation with their respective weights, and then this sum is input into the next layer's activation function. This process is repeated until the output layer is finally reached. After that, the loss is calculated for the output then the Stochastic Gradient Descent algorithm works to diminish the loss. The Stochastic Gradient Descent algorithm does the loss minimization process by calculating the derivative of the loss function by using back propagation and then updating the weights in the network appropriately.

Yamen Aly
Mar, 27 2022