From Basics to Bots: My Weekly AI Engineering Adventure-33

Hi Pythonistas!,

We now have:

Tokens
Embeddings
Attention
The Transformer

But one big question remains: How does ChatGPT actually generate text?
It doesn’t dump a paragraph at once.It writes one token at a time.

The Core Rule

ChatGPT follows a single rule: Given everything so far, predict the next token.That’s it.
No planning.
No outline.
No future awareness.
Just next-token prediction.

Step-by-Step Generation

When you type a prompt:

Your text is tokenized
Tokens become embeddings
They pass through the Transformer
The model outputs probabilities for the next token
One token is selected.
That token is appended to the input.
Then the whole process repeats.
Again.And again.And again.

Why It Feels Like Thinking

From the outside, this looks intelligent.The model never sees future tokens.
It only reacts to what already exists

Each new token reshapes context.Meaning emerges from iteration.

Masked Self-Attention - No Cheating Allowed

During generation, the model is not allowed to look ahead.
Attention is masked so: Token 5 can’t see token 6.
The future doesn’t exist yet.
This keeps generation honest.

Probabilities, Not Decisions
The output is not a word.

It’s a probability distribution:

Token A → 40%
Token B → 25%
Token C → 5%

What happens next depends on sampling strategy.

Sampling Choices Matter
This is where personality comes from.

Greedy → always pick highest probability (boring)
Temperature → controls randomness
Top-k / Top-p → limit choices

Same model.
Different behavior. Style is a decoding choice.

Why Mistakes Snowball

Once a token is chosen:

It becomes part of the context
Even if it’s wrong
The model doesn’t correct itself.It builds on it.

This explains: Hallucinations, Confident nonsense

Long Text Is Just Repetition. A full paragraph is just:Thousands of next-token predictions Chained together

There’s no paragraph mode
Only: token → token → token

What I Learned This Week

Text is generated one token at a time
Each step predicts probabilities
Sampling decides the final output
The model never sees the future
Errors propagate forward

At this point, we know how ChatGPT speaks.

What's Coming Next

Next week we will learn about training a language model