⬅️Training & Backpropagation
How a network learns from mistakes
Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.
Pause and experiment as you go.
Before We Begin
What we are learning today
Backpropagation is the bookkeeping system that makes neural-network training practical. After a prediction is made, we measure the error, calculate how much each weight contributed to that error, and then nudge the weights in the direction that should reduce future mistakes.
How this lesson fits
This module introduces the core architecture behind much of modern AI. Students follow information as it moves through layers, is transformed by weights and activations, and eventually becomes a prediction that can be improved through feedback.
The big question
How do large collections of simple numerical operations combine into a model that can recognize patterns humans struggle to hand-code?
Why You Should Care
Students often hear backpropagation as if it were a mysterious black box. This lesson turns it into a comprehensible process: compute loss, trace responsibility backward, and update parameters gradually until performance improves.
Where this is used today
- ✓Training the neural networks behind modern language, vision, and speech systems
- ✓Optimization problems where a differentiable model must improve through repeated feedback
- ✓Scientific and business models that tune parameters by following gradients
Think of it like this
It is like reviewing a team performance after the game. You do not just say 'we lost'; you identify which decisions mattered, how strongly they mattered, and what each player should change next time.
Easy mistake to make
Backpropagation is not magic learning dust. It is a structured accounting method for assigning credit and blame across many connected weights.
By the end, you should be able to say:
- Explain why gradients indicate how a small parameter change should affect the loss
- Describe how errors are propagated backward through successive layers using the chain rule
- Connect learning rate, gradients, and convergence to stable or unstable training behavior
Think about this first
If a model consistently predicts values that are too high, what kind of information would you need in order to decide which weights should decrease and by how much?
Words we will keep using
Backpropagation: How the Network Learns from Mistakes
Backpropagation sounds intimidating, but it's really just a "blame game." When the network makes a mistake, we trace the error backward through the connections to find out which weights were responsible. Then we nudge them to do better next time.
The chain rule is just a way of tracing influence. The output depends on the hidden units, the hidden units depend on the weights, so the error can be followed all the way back to each parameter.
The 5 Moves to Watch
Live 2-Layer Network — Watch Weights Update
Node colour: green = high, red = low. Blue edges = positive weight, red = negative. Dashed node = true label y.
Loss curve
Click 100 Steps to watch it learn!
Current weight values
Vanishing & Exploding Gradients
In very deep networks, the gradient can shrink until learning becomes painfully slow, or grow until training becomes unstable. That is why modern architectures use tools like ReLU, normalization, and residual connections to keep learning healthy.