Back to all lessons
Machine LearningIntermediate

🔭Dimensionality Reduction

Keeping the important information

Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.

25 min- Explore at your own pace

Before We Begin

What we are learning today

High-dimensional data can be difficult to visualize, noisy to model, and expensive to work with. Dimensionality reduction asks a disciplined question: can we compress the data into fewer directions while keeping the structure that matters most?

How this lesson fits

This module is where the course shifts from explicit rules to learned patterns. Instead of telling the machine exactly what to do in every case, we give it examples, define success, and let it infer a decision rule from the data.

The big question

How can a machine study examples, extract useful patterns, and make predictions on cases it has never seen before?

Distinguish supervised, unsupervised, and reward-driven learning setupsInterpret the output of common models in plain English instead of opaque jargonCompare the tradeoffs between accuracy, interpretability, flexibility, and speed

Why You Should Care

Students often assume that more columns automatically mean more intelligence. This lesson corrects that instinct by showing that redundant or noisy dimensions can actually make patterns harder to see and models harder to train.

Where this is used today

  • Visualization tools such as PCA, t-SNE, and UMAP that project complex datasets into interpretable views
  • Compression workflows that reduce storage while preserving salient information
  • Preprocessing pipelines that simplify features before training downstream models

Think of it like this

Think of shining a light on a 3D object to create a 2D shadow. You lose some detail, but if you choose the viewing angle well, the shadow still preserves the most informative shape.

Easy mistake to make

Dimensionality reduction is not random feature deletion. Methods such as PCA create new combined dimensions that are chosen specifically to preserve as much informative structure as possible.

By the end, you should be able to say:

  • Explain the curse of dimensionality in plain language and why sparse spaces create problems
  • Describe PCA as finding the directions that capture the most variation in the data
  • Connect lower-dimensional representations to visualization, compression, and model preparation

Think about this first

If you had to summarize a student's academic profile with only two numbers, which ones would you choose, and what important information would be lost in the compression?

Words we will keep using

dimensionfeaturevarianceprincipal componentprojection

Why We Shrink the Number of Features

Imagine taking a photo of a 3D statue. The photo is 2D, but if you pick the right angle, you can still recognize the shape. Dimensionality reduction is the art of finding that perfect angle—simplifying the data without destroying the meaning.

PCASquashes the data flat, keeping the widest (most varied) view.
t-SNEKeeps neighbors together. Great for visualizing clusters.
AutoencodersNeural networks that learn to zip and unzip data.

PCA — Principal Component Analysis

PCA asks a very practical question: if I had to redraw this dataset using fewer axes, which new directions would keep the most useful information? The first principal component follows the strongest spread in the data, the second follows the next strongest spread, and so on.

PC1=argmaxv=1Var(Xv)\text{PC}_1 = \arg\max_{\|v\|=1} \text{Var}(Xv)

Left: rotating 3D view. Right: PCA projection to 2D (always same orientation).

What PCA is trying to do:

  1. Shift the data so the cloud is centered around the origin
  2. Measure which features tend to vary together using the covariance matrix
  3. Find the directions where the data spreads out the most
  4. Project the data onto the top directions you want to keep
Scree Plot — the bars show how much variation each principal component explains, and the line shows how quickly those pieces add up.
Why people use it: to visualize embeddings, remove noise, speed up later models, and summarize messy datasets in a cleaner way.