Back Propagation explained in easy words !!
What is Back Propagation in
Machine Learning?
Back propagation is
an algorithm used in supervised learning of Neural Networks that is used by
Stochastic Gradient Descent to calculate the gradient of the loss function and
therefore decrease the loss during the training of a neural
network.
What is Stochastic Gradient Descent?
Before
getting into the details of how back propagation works let us talk first about Stochastic Gradient
Descent or SGD for short. SGD (most commonly used) is what we
call an optimization algorithm, it does exactly as the name suggests it
optimizes the weights of the neural networks for every epoch to minimize the
loss function during training of the model.
Some ML Terminologies:
Model: An algorithm that is
trained to recognize specific patterns whether it be images, voice, text, etc.
Epoch: Iteration or repetition of
times that the data set passes through the model during training.
Input Layer: the input layer passes
data to the first hidden layer where the data is passed through an activation
function then multiplies the data with the weights of the first hidden layer.
Hidden Layers: they are the layers
in between the input and output layers in which the function applies weights to
the data and passes them to the activation function.
Output Layers: it is the last
layer in a neural net where the output of the model is produced.
How does Back Propagation work?
Returning
to back propagation, here we have a random neural net with two hidden layers,
the data is first fed forward from the input layer it propagates(each layer
receives input from the previous layer) forward through the network until it
reaches the output layer. Then the loss is calculated for the result given
depending on the loss function. To optimize the loss function the derivative of
the loss function is calculated with respect to the weights in the model, to
calculate the gradient the SGD algorithm uses back propagation. After forward
propagation as the name implies back propagation passes through the network
backward updating the weights of each layer to minimize the loss function. The
Stochastic Gradient Descent algorithm starts by looking at the output given
from the output layer and then decides based on the calculation which values
should increase and which weights should decrease to minimize the error. Since
the values of the output are calculated by the sum of weights of the layer
being multiplied with the output from the previous layer and also the
activation output cannot be changed directly(because it is a calculation based
on the weights and the output of the previous layers) one way to indirectly
update the values for the output layer is to change the activation output from
the preceding layer or by updating the weights that are connected to the output
layer and jumping backward until the input layer is reached updating each
layers weights for each epoch lowering the loss until the training is complete.
Summary
To
recap, when training an artificial neural network, data is passed into the
model. The data goes through the model through a process called forward
propagation where the weighted sum is repetitively calculated from the output
previous layers activation with their respective weights, and then this sum is
input into the next layer's activation function. This process is repeated until
the output layer is finally reached. After that, the loss is calculated for the
output then the Stochastic Gradient Descent algorithm works to diminish the
loss. The Stochastic Gradient Descent algorithm does the loss minimization
process by calculating the derivative of the loss function by using back
propagation and then updating the weights in the network appropriately.
- Yamen Aly
- Mar, 27 2022