🔭Dimensionality Reduction
Keeping the important information
Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.
Pause and experiment as you go.
Before We Begin
What we are learning today
High-dimensional data can be difficult to visualize, noisy to model, and expensive to work with. Dimensionality reduction asks a disciplined question: can we compress the data into fewer directions while keeping the structure that matters most?
How this lesson fits
This module is where the course shifts from explicit rules to learned patterns. Instead of telling the machine exactly what to do in every case, we give it examples, define success, and let it infer a decision rule from the data.
The big question
How can a machine study examples, extract useful patterns, and make predictions on cases it has never seen before?
Why You Should Care
Students often assume that more columns automatically mean more intelligence. This lesson corrects that instinct by showing that redundant or noisy dimensions can actually make patterns harder to see and models harder to train.
Where this is used today
- ✓Visualization tools such as PCA, t-SNE, and UMAP that project complex datasets into interpretable views
- ✓Compression workflows that reduce storage while preserving salient information
- ✓Preprocessing pipelines that simplify features before training downstream models
Think of it like this
Think of shining a light on a 3D object to create a 2D shadow. You lose some detail, but if you choose the viewing angle well, the shadow still preserves the most informative shape.
Easy mistake to make
Dimensionality reduction is not random feature deletion. Methods such as PCA create new combined dimensions that are chosen specifically to preserve as much informative structure as possible.
By the end, you should be able to say:
- Explain the curse of dimensionality in plain language and why sparse spaces create problems
- Describe PCA as finding the directions that capture the most variation in the data
- Connect lower-dimensional representations to visualization, compression, and model preparation
Think about this first
If you had to summarize a student's academic profile with only two numbers, which ones would you choose, and what important information would be lost in the compression?
Words we will keep using
Why We Shrink the Number of Features
Imagine taking a photo of a 3D statue. The photo is 2D, but if you pick the right angle, you can still recognize the shape. Dimensionality reduction is the art of finding that perfect angle—simplifying the data without destroying the meaning.
PCA — Principal Component Analysis
PCA asks a very practical question: if I had to redraw this dataset using fewer axes, which new directions would keep the most useful information? The first principal component follows the strongest spread in the data, the second follows the next strongest spread, and so on.
Left: rotating 3D view. Right: PCA projection to 2D (always same orientation).
What PCA is trying to do:
- Shift the data so the cloud is centered around the origin
- Measure which features tend to vary together using the covariance matrix
- Find the directions where the data spreads out the most
- Project the data onto the top directions you want to keep