From Basics to Bots: My Weekly AI Engineering Adventure-37

Hi Pythonistas!,

In the last post, I converted text into integers.At that point, the data looked like this:

[1, 0, 2, 2, 3]

Now the obvious question is: how does a model learn anything from just integers?

Short answer: it doesn’t.

Numbers like 1, 0, 2 don’t carry meaning by themselves. The model needs a richer representation.

That’s where embeddings come in.

code

import torch.nn as nn
import torch

n_embd = 64
block_size = 64

token_embedding = nn.Embedding(vocab_size, n_embd)
position_embedding = nn.Embedding(block_size, n_embd)

def embed(x):
    B, T = x.shape
    tok = token_embedding(x)
    pos = position_embedding(torch.arange(T, device=x.device))
    return tok + pos

We’re converting:

[1, 0, 2, 2, 3]

into:

[
 [0.12, -0.45, ..., 0.88],
 [0.91,  0.10, ..., -0.22],
 ...
]

Each token becomes a vector.

Step 1: Token Embedding

token_embedding = nn.Embedding(vocab_size, n_embd)

This is basically a lookup table.

Think of it like:

{
  0: [vector],
  1: [vector],
  2: [vector],
  ...
}

So when we do:

tok = token_embedding(x)

Each integer gets replaced by its vector.

Example

Input:

[1, 0, 2]

Output:

[
 [0.2, -0.1, ...],
 [0.5,  0.3, ...],
 [0.9, -0.7, ...]
]

These vectors are:

randomly initialized
learned during training

So the model slowly figures out:

which tokens are similar
which tokens are important

Step 2: Position Embedding

position_embedding = nn.Embedding(block_size, n_embd)

This part is easy to miss, but very important.

Because:

the model has no idea about order

Without this:

"cat sat"
"sat cat"

would look identical.

How it works

pos = position_embedding(torch.arange(T))

This creates vectors like:

position 0 → vector
position 1 → vector
position 2 → vector

Step 3: Combine Them

return tok + pos

We simply add:

token meaning + position info

So now each token knows:

what it is
where it is

What Changed for Me

Before this, I thought:

embeddings = just some preprocessing trick

After implementing it:

embeddings are where meaning starts to emerge

Because:

similar tokens start getting similar vectors
relationships get encoded numerically

The model never sees:

"hello"

It doesn’t even see:

[1, 0, 2, 2, 3]

It sees:

vectors in 64-dimensional space

That’s the actual input to the transformer.

Where This Fits

Text
↓
Tokenization
↓
Embedding ← (you are here)
↓
Attention
↓
Transformer
↓
Prediction

Why This Matters

Without embeddings:

the model can’t learn relationships
everything is just discrete integers

With embeddings:

the model gets a continuous space to learn patterns

What's Coming Next

Now we have meaningful vectors.how does the model decide which tokens to focus on?
That’s where things get interesting. which is Self-Attention