Hi Pythonistas!,
We now have:
- Tokens
- Embeddings
- Attention
- The Transformer
But one big question remains: How does ChatGPT actually generate text?
It doesn’t dump a paragraph at once.It writes one token at a time.
The Core Rule
ChatGPT follows a single rule: Given everything so far, predict the next token.That’s it.
No planning.
No outline.
No future awareness.
Just next-token prediction.
Step-by-Step Generation
When you type a prompt:
- Your text is tokenized
- Tokens become embeddings
- They pass through the Transformer
- The model outputs probabilities for the next token
- One token is selected.
- That token is appended to the input.
- Then the whole process repeats.
- Again.And again.And again.
Why It Feels Like Thinking
From the outside, this looks intelligent.The model never sees future tokens.
It only reacts to what already exists
Each new token reshapes context.Meaning emerges from iteration.
Masked Self-Attention - No Cheating Allowed
During generation, the model is not allowed to look ahead.
Attention is masked so: Token 5 can’t see token 6.
The future doesn’t exist yet.
This keeps generation honest.
Probabilities, Not Decisions
The output is not a word.
It’s a probability distribution:
Token A → 40%
Token B → 25%
Token C → 5%
What happens next depends on sampling strategy.
Sampling Choices Matter
This is where personality comes from.
Greedy → always pick highest probability (boring)
Temperature → controls randomness
Top-k / Top-p → limit choices
Same model.
Different behavior. Style is a decoding choice.
Why Mistakes Snowball
Once a token is chosen:
- It becomes part of the context
- Even if it’s wrong
- The model doesn’t correct itself.It builds on it.
This explains: Hallucinations, Confident nonsense
Long Text Is Just Repetition. A full paragraph is just:Thousands of next-token predictions Chained together
There’s no paragraph mode
Only: token → token → token
What I Learned This Week
- Text is generated one token at a time
- Each step predicts probabilities
- Sampling decides the final output
- The model never sees the future
- Errors propagate forward
At this point, we know how ChatGPT speaks.
What's Coming Next
Next week we will learn about training a language model