Machine LearningBeginner

🌳Decision Trees & Random Forests

Learning by asking better questions

Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.

25 min

Pause and experiment as you go.

25 min- Explore at your own pace

Before We Begin

What we are learning today

Decision trees work by repeatedly asking the question that best separates the data at the current step. Each split reduces uncertainty, and the final leaf gives a prediction. Random forests improve on this by averaging across many slightly different trees so one unlucky split does not dominate the result.

How this lesson fits

This module is where the course shifts from explicit rules to learned patterns. Instead of telling the machine exactly what to do in every case, we give it examples, define success, and let it infer a decision rule from the data.

The big question

How can a machine study examples, extract useful patterns, and make predictions on cases it has never seen before?

Distinguish supervised, unsupervised, and reward-driven learning setupsInterpret the output of common models in plain English instead of opaque jargonCompare the tradeoffs between accuracy, interpretability, flexibility, and speed

Why You Should Care

Trees are one of the best bridges between human reasoning and machine learning. Students can see the decision process, critique it, and understand why ensembling often beats a single overconfident model.

Where this is used today

✓Loan and risk systems that split applicants by measurable financial factors
✓Clinical triage workflows that route patients based on symptoms and severity
✓Business analytics models where stakeholders need a relatively interpretable decision path

Think of it like this

Think of a triage nurse narrowing possibilities: Do you have a fever? Has it lasted more than two days? Are you having trouble breathing? Each answer rules some outcomes in and others out.

Easy mistake to make

A deeper tree is not automatically a better tree. If it keeps splitting until every edge case is memorized, it may fit the training data beautifully and still fail on new examples.

By the end, you should be able to say:

Explain how a tree chooses a split using impurity reduction or information gain
Interpret branches, leaves, and terminal predictions with confidence
Explain why averaging many trees can reduce variance and overfitting

Think about this first

If you were deciding whether to approve a loan, what first question would you ask, and why would that question separate applicants better than others?

Words we will keep using

splitnodeleafimpurityrandom forest

How Decision Trees Work

A decision tree is just a game of "20 Questions." The computer learns which questions to ask to split the data into clean groups. It is one of the few AI models you can print out and read like a manual.

Gini ImpurityA fancy name for "messiness." The goal is to make groups that are pure (all Yes or all No).

SplittingThe tree tries many possible questions and keeps the one that best separates the classes.

OverfittingIf you ask too many questions, you memorize the specific examples instead of learning the general rule.

Gini impurity is written as

G = 1 - \sum_k p_k^2

. You do not need to calculate it by hand right now. Just remember: the smaller it is, the “cleaner” the node is.

Loan Approval Tree — Walk-through

Move the sliders and follow the highlighted path. You can literally watch the model reason its way to a decision.

Age: 35

Income ($k): 45

(Only used if Age < 30)

Credit Score: 720

(Only used if Age ≥ 30)

Decision: ✅ Approve

Random Forests

A single tree can be shaky—change one data point, and the whole structure might flip. A Random Forest solves this by training hundreds of different trees and letting them vote.

for i in 1..N:

sample = bootstrap(data)

features = random_subset(features)

tree_i = DecisionTree(sample, features)

predict = majority_vote(tree_1...tree_N)

Why it helps: A single deep tree can change a lot if the training data changes a little. Averaging many trees makes the final model more stable.

Feature importance: Forests also give a rough sense of which features matter most, which is useful when you want an interpretable summary.