Back to all lessons
Machine LearningBeginner

πŸ“ˆRegression & Classification

Predicting numbers and choosing categories

Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.

30 min- Explore at your own pace

Before We Begin

What we are learning today

A large share of introductory machine learning reduces to two questions. Regression asks 'how much?' and classification asks 'which category?'. Even when later models become more complex, they often still boil down to one of these two prediction styles.

How this lesson fits

This module is where the course shifts from explicit rules to learned patterns. Instead of telling the machine exactly what to do in every case, we give it examples, define success, and let it infer a decision rule from the data.

The big question

How can a machine study examples, extract useful patterns, and make predictions on cases it has never seen before?

Distinguish supervised, unsupervised, and reward-driven learning setupsInterpret the output of common models in plain English instead of opaque jargonCompare the tradeoffs between accuracy, interpretability, flexibility, and speed

Why You Should Care

Students who can clearly separate these two task types are much less likely to misuse models or misread outputs. It also creates a sturdy foundation for understanding loss functions, metrics, and later neural-network examples.

Where this is used today

  • βœ“Predicting prices, temperatures, wait times, or energy usage as numerical outputs
  • βœ“Classifying tumors, emails, images, or transactions into discrete categories
  • βœ“Estimating probabilities for decision support before a final yes-no action is taken

Think of it like this

If you are estimating the selling price of a house, you are doing regression. If you are deciding whether an email is spam or not spam, you are doing classification. One output is continuous, the other is categorical.

Easy mistake to make

Logistic regression is confusingly named. In most practical settings it is used as a classification model because it estimates class probabilities rather than arbitrary numeric values.

By the end, you should be able to say:

  • Tell the difference between regression targets and classification labels without hesitation
  • Interpret a fitted line, a score, and a decision boundary at a conceptual level
  • Relate different output types to common evaluation metrics such as error, accuracy, and probability

Think about this first

Which task is regression and which is classification: predicting a student's exact exam score, or predicting whether they will pass the course? Explain the difference in the expected output.

Words we will keep using

regressionclassificationdecision boundaryerrorprobability

Linear Regression

Linear regression asks: "What is the number?" (e.g., price, temperature). It tries to draw a straight line that passes as close as possible to all your data points.

MSE=1nβˆ‘i=1n(y^iβˆ’yi)2\text{MSE} = \frac{1}{n}\sum_{i=1}^{n}(\hat{y}_i - y_i)^2

To find the best line, the computer plays a game of "hot or cold." It nudges the line slightly, checks if the error gets smaller, and repeats. This process is called gradient descent.

m←mβˆ’Ξ±βˆ‚MSEβˆ‚m,b←bβˆ’Ξ±βˆ‚MSEβˆ‚bm \leftarrow m - \alpha \frac{\partial \text{MSE}}{\partial m}, \quad b \leftarrow b - \alpha \frac{\partial \text{MSE}}{\partial b}

Gradient Descent on MSE Loss

y = 0.00x + 5.00
MSE: 0.000

Try a large learning rate (Ξ± β‰ˆ 0.04) and watch the loss. Too large β†’ oscillation; too small β†’ slow convergence.

Logistic Regression (Classification)

Logistic regression asks: "Yes or No?" (e.g., Spam or Not Spam). Instead of a raw number, it gives you a probability between 0% and 100%.

p^=Οƒ(w1x1+w2x2+b)=11+eβˆ’(w1x1+w2x2+b)\hat{p} = \sigma(w_1 x_1 + w_2 x_2 + b) = \frac{1}{1+e^{-(w_1 x_1 + w_2 x_2 + b)}}

Logistic Regression β€” Decision Boundary

Drag the sliders and watch the decision boundary move. That boundary is the place where the model is exactly undecided, with p^=0.5\hat{p}=0.5.

  • Blue points belong to one class, red points to the other.
  • The background color shows what the model currently believes.
  • The live score changes as soon as your boundary moves.

Notice the limitation: logistic regression can only draw a straight dividing line. If the pattern is curved, we need a more flexible model.

Model Evaluation Metrics

Accuracy is a trap. If 99% of emails are safe, a model that says "Safe" every time is 99% accurate but 100% useless at catching spam. We need better scoreboards.

The four cells

TP (True Positive) β€” correctly predicted positive

FP (False Positive) β€” predicted positive, actually negative (Type I error)

FN (False Negative) β€” predicted negative, actually positive (Type II error)

TN (True Negative) β€” correctly predicted negative

Accuracy = (TP+TN) / N. Fine when the classes are balanced, but risky when one class is rare.

Precision = TP / (TP+FP). When you say β€œpositive,” how often are you right?

Recall (TPR) = TP / (TP+FN). Of the real positives, how many did you actually catch?

F1 combines precision and recall into one score when both matter.

ROC-AUC measures ranking quality across many thresholds, not just one fixed cutoff.

Drag threshold β€” watch the orange dot move along the curve

Model parameters

Confusion Matrix

Pred +Pred βˆ’
Actual +TP = 17FN = 23
Actual βˆ’FP = 20TN = 20

Live metrics at t = 0.50

Accuracy

46%

Precision

46%

Recall

43%

F1 Score

44%

When classes are imbalanced

If one class is rare, accuracy can hide failure. In those cases, precision, recall, F1, and PR-AUC usually tell a more honest story.

Threshold trade-off

If you lower the threshold, the model says β€œpositive” more often. That usually helps recall but hurts precision. You are trading one kind of mistake against another.

Beyond binary classification

Different tasks need different scoreboards. There is no single metric that is best for every problem.

Regression Evaluation Metrics

When the output is a number, the question becomes: how far off were we? That is why regression uses error-based metrics instead of a confusion matrix.

MAE=1nβˆ‘βˆ£yiβˆ’y^i∣\text{MAE} = \frac{1}{n}\sum|y_i - \hat{y}_i|

Mean Absolute Error β€” robust to outliers, interpretable in original units

MSE=1nβˆ‘(yiβˆ’y^i)2\text{MSE} = \frac{1}{n}\sum(y_i - \hat{y}_i)^2

Mean Squared Error β€” penalises large errors heavily; used as training loss

RMSE=MSE\text{RMSE} = \sqrt{\text{MSE}}

Root MSE β€” same units as target, more interpretable than MSE

R2=1βˆ’SSresSStotR^2 = 1 - \frac{\text{SS}_{\text{res}}}{\text{SS}_{\text{tot}}}

RΒ² (coefficient of determination) β€” proportion of variance explained. 1.0 = perfect, 0 = no better than predicting the mean

Linear vs Logistic β€” Key Differences

Linear RegressionYou use this when the answer should be a number: house price, height, temperature, and so on.
Logistic RegressionYou use this when the answer should be a class or label, usually by predicting a probability first.