From Basics to Bots: My Weekly AI Engineering Adventure-13

Activation Functions - The Spark That Brings Neurons to Life

Posted by Afsal on 26-Sep-2025

Hi Pythonistas!

Last time, we explored the layers of a neural network. But here’s the twist: without a little magic, all those layers would just be stacking linear equations. And linear equations alone can’t capture the real-world messiness of images, speech, or text.

So, what’s the magic? Activation functions.

What is an Activation Function?

Think of a neuron as a switch.It takes input, applies some math, and then decides how much of that signal should pass forward. The activation function is that decision-maker. It introduces non-linearity, which lets neural networks learn curves, edges, speech tones, and all the complicated stuff life throws at us.

Common Activation Functions

1. Sigmoid

Formula squashes any number into a range between 0 and 1.

Great for probabilities.

Problem: for very large or small numbers, it “saturates” → gradients vanish.

Use: binary classification output.

2. Tanh (Hyperbolic Tangent)

Similar to sigmoid, but outputs between -1 and 1.

Centered at 0, so training is usually faster.

Still suffers from vanishing gradients.

Use: sometimes in hidden layers, older RNNs.

3. ReLU (Rectified Linear Unit)

Super simple: f(x) = max(0, x).

Fast, efficient, doesn’t saturate on positive side.

Problem: “dying ReLUs” (neuron stuck at 0 forever).

Use: the default for most hidden layers today.

4. Leaky ReLU

Fixes the dying ReLU by allowing a tiny slope for negative numbers.

Use: when you want ReLU’s speed but fewer dead neurons.

5. Softmax

Converts raw scores into probabilities that sum to 1.

Use: final layer in multi-class classification (cat vs dog vs horse).

6. GELU (Gaussian Error Linear Unit)

A smoother cousin of ReLU.

Used in Transformers (BERT, GPT).

Use: modern NLP and vision models.

7. Swish

Proposed by Google. Smooth like GELU, sometimes performs better than ReLU.

Use: deep networks where smooth gradients help.

How to Choose?

Binary classification output → Sigmoid.

Multi-class output → Softmax.

Hidden layers (general) → ReLU (default).

Hidden layers (if worried about dead neurons) → Leaky ReLU.

Fancy state-of-the-art models → GELU or Swish.

What is have learned

Activation functions are the spark plugs of neural networks.
Without them, your network would just be a boring linear machine.

Sigmoid & Tanh → old classics, still useful for outputs.

ReLU → the workhorse.

Leaky ReLU, GELU, Swish → modern refinements.

Softmax → king of classification outputs.

What’s Next?

Next week, we’ll meet the Loss Functions the mentor that tells our network how wrong it is and how to improve.