Back to all lessons
Machine LearningBeginner

🔵Clustering & K-Means

Finding groups without labels

Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.

20 min- Explore at your own pace

Before We Begin

What we are learning today

Clustering is about finding structure when no answer key has been provided. Instead of predicting a known label, the model searches for natural groupings based on similarity. K-means does this with a simple loop: assign points to the nearest center, move the centers, and repeat.

How this lesson fits

This module is where the course shifts from explicit rules to learned patterns. Instead of telling the machine exactly what to do in every case, we give it examples, define success, and let it infer a decision rule from the data.

The big question

How can a machine study examples, extract useful patterns, and make predictions on cases it has never seen before?

Distinguish supervised, unsupervised, and reward-driven learning setupsInterpret the output of common models in plain English instead of opaque jargonCompare the tradeoffs between accuracy, interpretability, flexibility, and speed

Why You Should Care

Students often assume all machine learning depends on labeled data. Clustering breaks that assumption and shows that one major use of ML is exploratory: revealing patterns, segments, or hidden organization that humans did not label ahead of time.

Where this is used today

  • Customer segmentation, where companies group users with similar behaviors or needs
  • Color quantization in image compression, where many shades are grouped into a smaller palette
  • Exploratory analysis that groups documents, search results, or biological samples by similarity

Think of it like this

Imagine sorting a mixed box of LEGO bricks without instructions. You could organize them by color, by size, or by shape. There may be several reasonable groupings, and the point is to choose one that reveals useful structure.

Easy mistake to make

K-means does not uncover one final, objective truth hiding in the data. Different values of K and different definitions of similarity can produce different but still useful groupings.

By the end, you should be able to say:

  • Explain why clustering is unsupervised and what information is missing compared with labeled training
  • Describe the alternating assignment and centroid-update steps in K-means
  • Interpret the elbow method as a practical but imperfect way to choose K

Think about this first

If you had to sort a pile of mixed objects with no labels, what clues would you rely on first, and how would you decide whether two objects belong together?

Words we will keep using

clustercentroidassignmentinertiaunsupervised

Clustering: Finding Hidden Groups

Clustering is like sorting a bucket of mixed LEGOs when you lost the instruction manual. You don't know what the groups are supposed to be, so you organize them by what looks similar—color, size, or shape.

K-Means Algorithm

  1. Guess: Drop K center points (centroids) randomly on the map.
  2. Assign: Every data point joins the team of the closest centroid.
  3. Update: Each team finds its new center of gravity and moves the centroid there.
  4. Repeat until nothing moves anymore.
Inertia (WCSS): This is a score for how tightly each group holds together. Lower means the points sit closer to their cluster center.
Choosing K: The tricky part is deciding how many groups really make sense. The elbow method helps you notice when adding more clusters stops buying you much.

Step-by-Step K-Means

Press start and watch the two repeating moves: assign points, then move centroids.

Iterations: 0
Phase: Init
Points: 0/90 assigned

Choosing K — The Elbow Method

How many clusters should you use? The "Elbow Method" is a rule of thumb: keep adding clusters until the improvement slows down. It's like eating pizza—the first slice is amazing, the fifth one is just okay.

Red dot = elbow at K=3. Adding more clusters beyond this gives diminishing returns.

Inertia = WCSSThis is the total squared distance from each point to its assigned centroid. It always goes down as K increases, so you should not blindly chase the smallest possible value.
Other methods for choosing K
  • Silhouette score: asks whether points are close to their own cluster and far from other clusters
  • Gap statistic: compares your clustering result to what random data would look like
  • Domain knowledge: sometimes you already know how many groups make sense

Other Clustering Methods

DBSCANUseful when the groups have messy shapes and some points should really be treated as noise instead of forced into a cluster.
HierarchicalBuilds clusters inside bigger clusters, almost like a family tree showing which groups sit inside others.
GMMAllows softer membership, so a point can partly belong to more than one cluster instead of getting a strict yes-or-no label.