Hi Pythonistas!
Last time, we explored the layers of a neural network. But here’s the twist: without a little magic, all those layers would just be stacking linear equations. And linear equations alone can’t capture the real-world messiness of images, speech, or text.
So, what’s the magic? Activation functions.
What is an Activation Function?
Think of a neuron as a switch.It takes input, applies some math, and then decides how much of that signal should pass forward. The activation function is that decision-maker. It introduces non-linearity, which lets neural networks learn curves, edges, speech tones, and all the complicated stuff life throws at us.
Common Activation Functions
1. Sigmoid
Formula squashes any number into a range between 0 and 1.
Great for probabilities.
Problem: for very large or small numbers, it “saturates” → gradients vanish.
Use: binary classification output.
2. Tanh (Hyperbolic Tangent)
Similar to sigmoid, but outputs between -1 and 1.
Centered at 0, so training is usually faster.
Still suffers from vanishing gradients.
Use: sometimes in hidden layers, older RNNs.
3. ReLU (Rectified Linear Unit)
Super simple: f(x) = max(0, x).
Fast, efficient, doesn’t saturate on positive side.
Problem: “dying ReLUs” (neuron stuck at 0 forever).
Use: the default for most hidden layers today.
4. Leaky ReLU
Fixes the dying ReLU by allowing a tiny slope for negative numbers.
Use: when you want ReLU’s speed but fewer dead neurons.
5. Softmax
Converts raw scores into probabilities that sum to 1.
Use: final layer in multi-class classification (cat vs dog vs horse).
6. GELU (Gaussian Error Linear Unit)
A smoother cousin of ReLU.
Used in Transformers (BERT, GPT).
Use: modern NLP and vision models.
7. Swish
Proposed by Google. Smooth like GELU, sometimes performs better than ReLU.
Use: deep networks where smooth gradients help.
How to Choose?
Binary classification output → Sigmoid.
Multi-class output → Softmax.
Hidden layers (general) → ReLU (default).
Hidden layers (if worried about dead neurons) → Leaky ReLU.
Fancy state-of-the-art models → GELU or Swish.
What is have learned
Activation functions are the spark plugs of neural networks.
Without them, your network would just be a boring linear machine.
Sigmoid & Tanh → old classics, still useful for outputs.
ReLU → the workhorse.
Leaky ReLU, GELU, Swish → modern refinements.
Softmax → king of classification outputs.
What’s Next?
Next week, we’ll meet the Loss Functions the mentor that tells our network how wrong it is and how to improve.