Back to all lessons
FoundationsBeginner

🎲Probability & Distributions

How AI talks about uncertainty

Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.

25 min- Explore at your own pace

Before We Begin

What we are learning today

Most real-world AI systems are not deciding between certainty and uncertainty. They are operating inside uncertainty all the time. Probability gives us a disciplined way to talk about what is likely, what is rare, and how new evidence should change our confidence.

How this lesson fits

This module builds the mental model underneath everything else in the curriculum. We start with explicit rules, then add uncertainty, then explore search, so students can see AI as a chain of concrete decisions rather than a pile of mysterious buzzwords.

The big question

How can a machine move from rigid step-by-step instructions to making sensible choices in a messy, uncertain world?

Trace a computation step by step and explain why each move happensUse probability to talk about uncertainty instead of pretending outcomes are guaranteedDescribe how search algorithms compare options and settle on a good path forward

Why You Should Care

As soon as a model predicts spam, disease risk, weather, or customer behavior, it is making claims under uncertainty. Students need to understand that a useful prediction is often not a certain one, and that confidence itself is part of the output.

Where this is used today

  • Weather forecasting, where probabilities communicate risk better than yes-or-no statements
  • Medical testing, where prior likelihood and new evidence combine to change confidence
  • Spam filtering and other Bayesian systems that rank outcomes by likelihood

Think of it like this

Think about leaving home on a cloudy morning. You do not know for certain whether it will rain, but the sky, forecast, and season all shift your confidence. Probability is the language for making that uncertainty explicit instead of pretending the answer is all-or-nothing.

Easy mistake to make

Probability does not promise the outcome of one single event. It describes uncertainty across possibilities and becomes most meaningful when interpreted over many similar situations.

By the end, you should be able to say:

  • Explain probability as a numerical way to describe uncertainty between impossible and certain
  • Compare Bernoulli, binomial, and normal distributions and identify when each is useful
  • Use Bayes’ theorem as a framework for updating beliefs when new evidence arrives

Think about this first

Why is saying there is a 70% chance of rain more useful than saying simply 'rain' or 'no rain'? What different decisions might you make with that extra nuance?

Words we will keep using

probabilitydistributionmeanvarianceevidence

The Language of Uncertainty

Life is random. Models almost never know the future for sure. Instead of saying "It will rain," they say "There is a 92% chance of rain." Probability is the tool we use to measure that uncertainty.

EventThe thing we are watching. A coin landing on heads, or an email being spam.
DistributionThe shape of luck. It shows every possible outcome and how likely it is.
Law of Large NumbersLuck is wild in the short run but predictable in the long run.

Part 1: The Coin Flip (Bernoulli)

🪙 Coin Flip Simulator

Total: 0 | H: 0 (0%) | T: 0

This is the simplest random experiment in the world. Flip a coin. One trial, two choices: Success or Failure. In math, we call this a Bernoulli trial.

P(X=1)=p,P(X=0)=1pP(X=1) = p, \quad P(X=0) = 1-p

Mean: E[X]=pE[X] = p   Variance: Var(X)=p(1p)\text{Var}(X) = p(1-p). Don't worry about the formulas yet. Just see that even a random coin flip has exact rules governing it.

Click Flip ×100. See how the bars jump around? Now keep clicking. The more you flip, the closer you get to 50/50. That is the Law of Large Numbers in action.

Part 2: The Bell Curve (Normal / Gaussian)

🔔 Normal (Bell Curve)

The Bell Curve (Normal distribution) is everywhere. Height, shoe size, test scores—whenever you add up lots of little random factors, you get this shape.

f(x)=12πσe(xμ)22σ2f(x) = \frac{1}{\sqrt{2\pi}\,\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}}
  • μ\mu moves the center left or right
  • σ\sigma controls whether the curve is tight or spread out
  • About 68% of values fall within ±1σ\pm 1\sigma of the mean

Part 3: Counting Successes (Binomial)

📊 Binomial Distribution

Mean = np = 5.00  |  Std = √(np(1-p)) = 1.58

Now repeat that simple yes/no experiment nn times. Instead of asking what happens once, we ask: how many successes do we get in total? That count follows a binomial distribution.

P(K=k)=(nk)pk(1p)nkP(K=k) = \binom{n}{k} p^k (1-p)^{n-k}

Set p = 0.5 and make nn bigger. You will see the bars begin to look more and more like a bell curve.

Real uses: How many emails get opened, how many basketball shots go in, or how many patients respond to a treatment.

Part 4: Bayes' Theorem — Updating Beliefs

🔄 Bayes Theorem Calculator

P(H|E) = P(E|H)·P(H) / [P(E|H)·P(H) + P(E|¬H)·P(¬H)]

Bayes' Theorem is the math of changing your mind. It tells you exactly how to update your beliefs when you see new evidence.

P(HE)=P(EH)P(H)P(E)P(H|E) = \frac{P(E|H)\,P(H)}{P(E)}
  • P(H) is your starting belief
  • P(E|H) asks how likely the evidence would be if the hypothesis were true
  • P(E) is the overall chance of seeing that evidence
  • P(H|E) is your new belief after taking the evidence into account
Classic example: A disease is rare, but the test is fairly good. Even then, a positive result may still mean the disease is unlikely because false positives add up.P(disease|+) ≈ 4.3% in this example.

This idea shows up everywhere in AI, from spam filters to medical decision systems.

Key Takeaways

  • Probability helps you talk about uncertainty instead of pretending every answer is exact.
  • A distribution describes the full range of outcomes, not just one guess.
  • The bell curve appears naturally when many small factors combine.
  • These ideas are basic tools for later topics such as HMMs, classifiers, and neural network outputs.