Hi Pythonistas!
In the last post we have learned about fully connected layers. Fully connected layers are powerful.
But when it comes to images, they struggle.
Why?
- Images are huge
- Dense layers ignore spatial structure
- Too many parameters, too fast
This is where Convolutional Neural Networks (CNNs) come in.
Why Dense Layers Fail for Images
Take a small image: 224 × 224 × 3
Flatten it → 150,528 inputs
Now connect that to just 1,000 neurons?
- Millions of parameters
- Easy overfitting
- Slow training
Worse:
A pixel’s position matters in images, Dense layers treat every pixel as unrelated , CNNs fix this.
CNNs are built on three simple ideas:
- Local connections
- Shared weights
- Spatial awareness
Instead of looking at the whole image at once, CNNs:
- Look at small patches
- Slide across the image
- Learn patterns like edges, corners, texture
Just like how humans scan images.
Convolution: The Star of the Show
A convolution uses a small filter (kernel), like:
3×3
5×5
This filter:
- Slides over the image
- Performs a weighted sum
- Produces a feature map
Each filter learns one kind of pattern.
One filter → edges
Another → curves
Another → textures
Weight Sharing: Less Is More
The same filter is reused across the entire image.
Why this matters:
- Far fewer parameters
- Detects patterns anywhere in the image
- Faster and more efficient
A cat is still a cat, whether it’s on the left or right.
Pooling: Shrinking Smartly
Pooling layers reduce spatial size.
Common types:
- Max Pooling
- Average Pooling
Benefits:
- Less computation
- More robustness
- Focus on what exists, not exactly where
Pooling is like zooming out without losing meaning.
CNNs Learn Hierarchies
CNNs don’t learn everything at once.
Early layers:
Edges
Corners
Middle layers:
Shapes
Textures
Deep layers:
Objects
Faces
Concepts
Simple → Complex
Pixels → Meaning
Where Do Dense Layers Fit Now?
CNNs usually end with:
Flattening
One or more dense layers
CNNs: Extract features
Dense layers: Make decisions
Best of both worlds
Why CNNs Changed Everything
CNNs made it possible to:
- Recognize faces
- Detect objects
- Power self-driving cars
- Win ImageNet challenges
They’re the reason deep learning clicked for vision.
What I Learned This Week
Dense layers don’t scale well for images
CNNs exploit spatial structure
Convolutions learn local patterns
Weight sharing saves parameters
Pooling adds robustness
CNNs don’t just see pixels they learn how to look
What's next
Next week we will learn about RNN