Back to all lessons
Neural NetworksIntermediate

🧠Feedforward Neural Networks

From neurons to layered predictions

Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.

35 min- Explore at your own pace

Before We Begin

What we are learning today

A feedforward neural network is a layered function builder. Each neuron computes a weighted combination of its inputs, applies an activation, and passes the result forward. When enough of these transformations are stacked together, the network can capture patterns that a simple linear model would miss entirely.

How this lesson fits

This module introduces the core architecture behind much of modern AI. Students follow information as it moves through layers, is transformed by weights and activations, and eventually becomes a prediction that can be improved through feedback.

The big question

How do large collections of simple numerical operations combine into a model that can recognize patterns humans struggle to hand-code?

Trace a forward pass through a network and explain what each layer contributesExplain why nonlinear activations and gradients make learning possibleRelate abstract neural-network mechanics to practical perception tasks

Why You Should Care

Modern vision, speech, and language systems all depend on the basic idea that useful internal representations can be built layer by layer. Students who understand the forward pass will have a much easier time understanding later deep-learning architectures.

Where this is used today

  • Digit and character recognition tasks where simple visual patterns must be mapped to labels
  • Function approximation problems where the relationship between input and output is highly nonlinear
  • Basic control and prediction systems in robotics, forecasting, and sensor processing

Think of it like this

Imagine an assembly line where each station refines the material a little further. The early stations do simple transformations, but the later ones combine those partial results into something much more meaningful.

Easy mistake to make

Neural networks borrow vocabulary from biology, but they are still mathematical models, not faithful simulations of real brains.

By the end, you should be able to say:

  • Identify inputs, hidden layers, weights, biases, activations, and outputs in a simple network
  • Explain why activation functions are necessary if we want networks to learn nonlinear relationships
  • Trace a small forward pass numerically or conceptually from input to prediction

Think about this first

Why might several simple transformations stacked in sequence describe a pattern better than one single straight-line rule? Give a real-world example if you can.

Words we will keep using

neuronlayerweightbiasactivation

Feedforward Neural Networks

A feedforward neural network is a bucket brigade of information. Each layer takes the data, mixes it up, transforms it, and hands it to the next layer. If you understand this forward flow, you understand the skeleton of deep learning.

h(l)=σ ⁣(W(l)h(l1)+b(l))h^{(l)} = \sigma\!\left(W^{(l)}\, h^{(l-1)} + b^{(l)}\right)
Why non-linearity?If you don't add a non-linear activation, the whole network collapses into a single straight-line rule. No matter how deep you make it, it can't learn curves.
Why layers helpExtra layers let the network build up complexity step by step—finding edges, then shapes, then objects.
What gets learnedThe network doesn't change the math. It changes the weights—tuning the connections until the output looks right.

Activation Functions

relu

sigmoid

tanh

gelu

linear

RELUmax(0, x)Used in: ResNets, most modern CNNs

Different activations change how flexible the network can be. Modern language models often use GELU because it behaves smoothly and trains well at scale.

Interactive Forward Pass

Node colour: Green = active (firing), Red = inactive (suppressed). Values shown inside.

Input values

Architecture

2 → 4 → 3 → 1 — Activation: relu

Output: 0.495

Decision Boundary

The decision boundary is the line where the network changes its mind. On one side, it says "Yes"; on the other, "No". This is the best place to see why non-linearity matters—try switching to Linear and see how the boundary gets stuck as a straight line.

Linear (no activation)
The model can only draw straight boundaries, no matter how many layers you stack.
ReLU / GELU / Tanh
These activations bend the model away from a straight line, which is why the network can handle richer patterns.
Key insight
Depth alone is not enough. You need depth and non-linearity together.

Layer Computation Trace

This table shows the first hidden layer in slow motion. Each neuron multiplies the inputs by weights, adds them up, adds a bias, and then sends the result through the activation function.

zj=iwjixi+bjhj=σ(zj)z_j = \sum_i w_{ji}\, x_i + b_j \qquad h_j = \sigma(z_j)
Neuronw1·x1w2·x2+bias= z
h10.30·0.80-0.90·-0.300.140.645
h2-0.68·0.80-0.25·-0.30-0.01-0.477
h3-0.80·0.800.71·-0.30-0.15-0.998

The shading shows how much each piece contributes. This is the arithmetic hidden inside the network diagram above.