From Basics to Bots: My Weekly AI Engineering Adventure-35

Fine-Tuning & Alignment - Making the Model Actually Useful

Posted by Afsal on 17-Apr-2026

Hi Pythoninstas!

So far, we’ve learned how to train a language model.

It can:

Predict text well
Continue patterns
Mimic styles

But left alone, it’s unpredictable.

Sometimes helpful
Sometimes wrong
Sometimes unsafe

So the next question is obvious:

How do we shape its behavior

Pretraining vs Fine-Tuning

Pretraining:

Huge dataset
Generic objective
Learn language

Fine-tuning:

Smaller, curated data
Specific goals
Behave like this

Same model.
Different phase.

Supervised Fine-Tuning (SFT)

First alignment step.

Humans create:

Prompts
Ideal responses
The model learns: When I see this kind of input, this is how I should respond.

Why Pretraining Isn’t Enough

A pretrained model:

Can imitate anything
Doesn’t know what’s good
Has no concept of intent

Fine-tuning introduces:

Helpfulness
Clarity
Politeness
Not intelligence - direction.

Reinforcement Learning from Human Feedback (RLHF)

This is where things get interesting.

Instead of labels:

Humans rank responses
This one is better than that one
A reward model learns these preferences.

The language model is then trained to:

Maximize human preference.
Alignment Is Optimization, Not Ethics
Important reality check.

The model does not understand:

Values
Morals
Safety
It learns patterns that look aligned.

Trade-offs Everywhere

More alignment:

Safer
More predictable

But also:

Less creative
More cautious

ChatGPT is:

Pretrained
Fine-tuned
Aligned for conversation

That’s why it:

Explains
Refuses
Asks clarifying questions

What I Learned This Week

Pretraining learns language
Fine-tuning shapes behavior
SFT teaches good examples
RLHF optimizes for human preference
Alignment is engineering, not understanding

At this point, we understand how ChatGPT is built.

What's Coming Next

We will start building mini-gpt

←Previous Post Next Post→