Neural Network Inputs

6/30

After tokenization, several additional processing steps prepare data for the neural network:

The context length determines how much text the model can "see" at once and is a key limiting factor in LLM capabilities.

Token IDs are converted to dense vector representations (embeddings) before processing:

Token ID

e.g., 42361

Embedding Vector

[0.1, -0.3, 0.8, 0.5, ..., -0.2]

Key Properties:

Notice how similar concepts cluster together, and relationships between pairs are preserved.

Once upon a time there was a

Context window of tokens

Model Predicts

young 65%

beautiful 15%

small 8%

magical 5%

... 7%

Probability across vocabulary

During training, input windows are extracted from the dataset, and the model learns to predict what comes next.

This next-token prediction is the fundamental task of language modeling.