Hi Pythonistas!
In the last post we have learned about the activation function. But now comes the most important part
how does a neural network know it’s doing a good job or failing miserably? That’s where Loss Functions come in.
What is a Loss Function?
A loss function is like a teacher’s red pen. After the network makes a prediction, the loss function measures:
- How close was the prediction to the truth?
- By how much should we punish the network?
- The smaller the loss, the better the model is learning.
Types of Loss Functions
1. Regression Losses (for predicting numbers)
Mean Squared Error (MSE): Squares the difference. Big mistakes hurt more.
Mean Absolute Error (MAE): Uses absolute difference. Treats all errors equally.
Huber Loss: Mixes both less sensitive to outliers than MSE, smoother than MAE.
Example: Predicting house prices.
2. Classification Losses (for categories)
Binary Cross-Entropy: Used when deciding between 2 classes (spam vs not spam).
Categorical Cross-Entropy: For multi-class problems (cat vs dog vs horse).
Sparse Categorical Cross-Entropy: Same as above but uses integer labels.
Hinge Loss: Inspired by SVMs. Works for "margin-based" classification.
Example: Identifying handwritten digits (0–9).
3. Specialized Losses
Focal Loss: Focuses on hard-to-classify examples → great for imbalanced datasets.
Dice Loss / IoU Loss: Popular in image segmentation → handles class imbalance well.
Contrastive Loss / Triplet Loss: Used for similarity tasks (like face verification).
KL Divergence: Measures difference between two probability distributions → used in VAEs and knowledge distillation.
CTC Loss: For sequence tasks where alignment is unknown (speech recognition, OCR).
Example: Detecting tumors in medical scans (Dice Loss).
How to Choose?
Regression tasks → MSE, MAE, or Huber.
Binary classification → Binary Cross-Entropy.
Multi-class classification → Categorical Cross-Entropy (with softmax).
Imbalanced data → Focal Loss, Dice Loss.
Similarity learning → Contrastive / Triplet Loss.
Probabilistic models → KL Divergence.
Speech / OCR → CTC Loss.
What I Learned
Loss functions are the compass of a neural network. Without them, the model wouldn’t know where to go.
Regression → MSE, MAE, Huber.
Classification → Cross-Entropy family.
Special tasks → Dice, Focal, Triplet, KL, CTC.
What’s Next
Next week, we’ll talk about the Optimizers the "drivers" that actually move the network in the right direction while minimizing this loss.