From Basics to Bots: My Weekly AI Engineering Adventure-22

Convolutional Neural Networks (CNNs) - Learning by Looking

Posted by Afsal on 16-Jan-2026

Hi Pythonistas!

In the last post we have learned about fully connected layers. Fully connected layers are powerful.
But when it comes to images, they struggle.

Why?

  • Images are huge
  • Dense layers ignore spatial structure
  • Too many parameters, too fast

This is where Convolutional Neural Networks (CNNs) come in.

Why Dense Layers Fail for Images

Take a small image: 224 × 224 × 3
Flatten it → 150,528 inputs

Now connect that to just 1,000 neurons?

  • Millions of parameters
  • Easy overfitting
  • Slow training

Worse:

A pixel’s position matters in images, Dense layers treat every pixel as unrelated , CNNs fix this.

CNNs are built on three simple ideas:

  • Local connections
  • Shared weights
  • Spatial awareness

Instead of looking at the whole image at once, CNNs:

  • Look at small patches
  • Slide across the image
  • Learn patterns like edges, corners, texture

Just like how humans scan images.

Convolution: The Star of the Show

A convolution uses a small filter (kernel), like:

3×3
5×5

This filter:

  • Slides over the image
  • Performs a weighted sum
  • Produces a feature map

Each filter learns one kind of pattern.

One filter → edges
Another → curves
Another → textures

Weight Sharing: Less Is More

The same filter is reused across the entire image.

Why this matters:

  • Far fewer parameters
  • Detects patterns anywhere in the image
  • Faster and more efficient

A cat is still a cat, whether it’s on the left or right.

Pooling: Shrinking Smartly

Pooling layers reduce spatial size.

Common types:

  • Max Pooling
  • Average Pooling

Benefits:

  • Less computation
  • More robustness
  • Focus on what exists, not exactly where

Pooling is like zooming out without losing meaning.

CNNs Learn Hierarchies
CNNs don’t learn everything at once.

Early layers:

Edges
Corners

Middle layers:

Shapes
Textures

Deep layers:

Objects
Faces
Concepts

Simple → Complex
Pixels → Meaning

Where Do Dense Layers Fit Now?

CNNs usually end with:

Flattening
One or more dense layers

CNNs: Extract features

Dense layers: Make decisions
Best of both worlds

Why CNNs Changed Everything

CNNs made it possible to:

  • Recognize faces
  • Detect objects
  • Power self-driving cars
  • Win ImageNet challenges

They’re the reason deep learning clicked for vision.

What I Learned This Week

Dense layers don’t scale well for images
CNNs exploit spatial structure
Convolutions learn local patterns
Weight sharing saves parameters
Pooling adds robustness
CNNs don’t just see pixels they learn how to look

What's next

Next week we will learn about RNN