From Basics to Bots: My Weekly AI Engineering Adventure-33

Autoregressive Generation - How Text Comes Out One Token at a Time

Posted by Afsal on 03-Apr-2026

Hi Pythonistas!,

We now have:

  • Tokens
  • Embeddings
  • Attention
  • The Transformer

But one big question remains: How does ChatGPT actually generate text?
It doesn’t dump a paragraph at once.It writes one token at a time.

The Core Rule

ChatGPT follows a single rule: Given everything so far, predict the next token.That’s it.
No planning.
No outline.
No future awareness.
Just next-token prediction.

Step-by-Step Generation

When you type a prompt: 

  • Your text is tokenized
  • Tokens become embeddings
  • They pass through the Transformer
  • The model outputs probabilities for the next token
  • One token is selected.
  • That token is appended to the input.
  • Then the whole process repeats.
  • Again.And again.And again.

Why It Feels Like Thinking

From the outside, this looks intelligent.The model never sees future tokens.
It only reacts to what already exists

Each new token reshapes context.Meaning emerges from iteration.

Masked Self-Attention - No Cheating Allowed

During generation, the model is not allowed to look ahead.
Attention is masked so: Token 5 can’t see token 6.
The future doesn’t exist yet.
This keeps generation honest.

Probabilities, Not Decisions
The output is not a word.

It’s a probability distribution:

Token A → 40%
Token B → 25%
Token C → 5%

What happens next depends on sampling strategy.

Sampling Choices Matter
This is where personality comes from.

Greedy → always pick highest probability (boring)
Temperature → controls randomness
Top-k / Top-p → limit choices

Same model.
Different behavior. Style is a decoding choice.

Why Mistakes Snowball

Once a token is chosen:

  • It becomes part of the context
  • Even if it’s wrong
  • The model doesn’t correct itself.It builds on it.

This explains: Hallucinations, Confident nonsense

Long Text Is Just Repetition. A full paragraph is just:Thousands of next-token predictions Chained together

There’s no paragraph mode
Only: token → token → token

What I Learned This Week 

  • Text is generated one token at a time
  • Each step predicts probabilities
  • Sampling decides the final output
  • The model never sees the future
  • Errors propagate forward

At this point, we know how ChatGPT speaks.

What's Coming Next

Next week we will learn about training a language model