Understanding Back Propagation Math: Manas Nand Mohan | Aug, 2024

SeniorTechInfo
2 Min Read
Manas Nand Mohan

Understanding Derivatives in Calculus

Derivative – In simple terms, it signifies the rate of change.

Key Symbols:

  • d – Indicates “a little bit of.” For example, dx means a small change in x, or du represents a slight variation in u.

So, what exactly does dy/dx convey?

dy/dx illustrates how much y alters concerning x. It provides information about both magnitude and direction of change.

Derivative Image

Fancy word for derivative is Gradient. Derivative at any point gives the slope.

When dealing with derivatives, we have a rule known as the Weight Update Rule in Back-Propagation:

Weight Update Rule Image
Weight Update Rule in Back-Propagation

The weights are then updated in the direction that reduces the error.

Explanation:

∂Loss​/∂w = +ve : Increasing the weight will enhance the loss, so to decrease the loss, we subtract from the weight.

∂Loss​/∂w = -ve : Increasing the weight will reduce the loss. To minimize loss, we need to augment the weight by adding to the previous weight.

Gradient Descent Image

In the process of back-propagation, we follow certain steps as outlined below:

    # Initialize epochs
    epochs = 5
    
    # Loop over each epoch
    for i in range(epochs):
        
        # Loop over each training sample
        for j in range(X.shape[0]):
            
            # Step 1: Select 1 row (randomly)
            x = select_random_row(X)
            y = select_corresponding_label(Y)
            
            # Step 2: Predict (using forward propagation)
            y_pred = predict(x, weights, bias)
            
            # Step 3: Calculate Loss (using loss function)
            loss = calculate_loss(y_pred, y)
            
            # Step 4: Update weights & bias using Gradient Descent
            gradient_w = compute_gradient_w(x, y, y_pred)
            gradient_b = compute_gradient_b(y, y_pred)
            
            weights = weights - learning_rate * gradient_w
            bias = bias - learning_rate * gradient_b
            
            # Calculate average loss for the epoch
            avg_loss = calculate_avg_loss(epoch_losses)
Gradient Descent Steps Image

For a more in-depth understanding, explore Back-Propagation: The Science of it.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *