Back to all lessons
Sequence ModelsIntermediate

Hidden Markov Models

When the real state is hidden from view

Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.

30 min- Explore at your own pace

Before We Begin

What we are learning today

Hidden Markov models separate the world into what we can observe and what we believe is happening underneath. The hidden state evolves over time, emits visible evidence, and must be inferred from those observations using probability.

How this lesson fits

Some data points only make sense when you know what came before them. This module studies models built for ordered information such as language, audio, weather, and time series, where sequence and memory matter as much as the current input.

The big question

How can a model represent the past well enough to make a strong decision about what is happening now or what should happen next?

Explain why sequence order changes meaning even when the same items are presentCompare probabilistic sequence models with neural sequence modelsTrack hidden state, memory, and context as they move across time steps

Why You Should Care

Many important variables in AI are not directly visible: intent, disease status, topic, emotion, part-of-speech, or system mode. HMMs teach students how to reason formally about hidden causes from observable evidence.

Where this is used today

  • Earlier speech-recognition systems that inferred hidden phonemes from audio signals
  • Gesture and activity recognition where underlying states must be inferred from motion data
  • Bioinformatics workflows that model hidden structure in DNA or protein sequences

Think of it like this

It is like trying to infer tomorrow's weather pattern from cloud cover, temperature, and wind. You never see the abstract weather 'state' directly, but the clues it leaves behind make some hidden explanations more likely than others.

Easy mistake to make

A hidden state is not mystical or unknowable. It is simply a variable we do not observe directly and must infer from noisy evidence.

By the end, you should be able to say:

  • Distinguish clearly between hidden states and observed evidence in a sequential setting
  • Explain transition and emission probabilities and what each contributes to the model
  • Describe the Viterbi algorithm as an efficient way to decode the most likely hidden path

Think about this first

Name a situation where you cannot directly observe the true state of something but can still make a strong guess from clues. What evidence would increase or reduce your confidence?

Words we will keep using

hidden stateobservationtransitionemissionViterbi

Hidden Markov Models

Think of this as the "Sherlock Holmes" model. You never see the crime (hidden state), only the clues left behind (observations). The HMM is a mathematical tool for working backwards from the clues to the likely truth.

Hidden StatesThe truth. The actual weather, or someone's true health. We never see this directly.
ObservationsThe evidence. An umbrella, a cough, or a credit card transaction.
ParametersThe rules. How likely is rain? How likely is an umbrella if it rains?

The Weather / Activity HMM

Transition Matrix A (State → State)

From\ToSunnyRainy
Sunny0.70.3
Rainy0.40.6

Emission Matrix B (State → Observation)

State\Obs🚶 Walk🛍️ Shop🧹 Clean
Sunny0.60.30.1
Rainy0.10.40.5

Select Observation Sequence

Sequence probability P(O|λ): 0.057816

Viterbi Algorithm — Most Likely State Sequence

The Viterbi algorithm asks: "What is the single most likely story that explains these clues?" It finds the best path through the possibilities without getting lost in the details.

Trellis diagram: columns = time steps, rows = hidden states. Highlighted nodes/edges = Viterbi decoded path. Numbers inside nodes = Viterbi probability.

StepObservationP(Sunny)P(Rainy)Most Likely State
t=0🚶 Walk0.360000.04000Sunny
t=1🚶 Walk0.151200.01080Sunny
t=2🛍️ Shop0.031750.01814Sunny
Decoded weather:SunnySunnySunny

Three Classic HMM Problems

Evaluation (Forward)How likely is this observed sequence under the current model?
Decoding (Viterbi)What hidden sequence is the best explanation for what we saw?
Learning (Baum-Welch)How should the probabilities be adjusted so the model matches the data better?

Applications: speech recognition, gesture recognition, biological sequence analysis, and any situation where an invisible process leaves visible traces behind.