LLM Learning Portal

Neural Network Inputs

6/30

From Tokens to Model Input

After tokenization, several additional processing steps prepare data for the neural network:

  • Token sequences

    Lists of token IDs form the primary input

  • Position encoding

    Position information added to each token

  • Context length limits

    Maximum sequence size (8K-128K tokens)

  • Token embeddings

    Each token represented as a vector (~1024-4096 dimensions)

The context length determines how much text the model can "see" at once and is a key limiting factor in LLM capabilities.

Embedding Vectors

Token IDs are converted to dense vector representations (embeddings) before processing:

Token ID

e.g., 42361

Embedding Vector

[0.1, -0.3, 0.8, 0.5, ..., -0.2]

Key Properties:

  • High-dimensional (768 to 4096 values per token)
  • Learned during pre-training
  • Captures semantic relationships
  • Similar words have similar vectors

Visualizing Word Embeddings

king queen man woman dog cat

Notice how similar concepts cluster together, and relationships between pairs are preserved.

The Neural Network Task

Predicting the Next Token

Input Window
Once upon a time there was a

Context window of tokens

Model Predicts

Probability Distribution
young 65%
beautiful 15%
small 8%
magical 5%
... 7%

Probability across vocabulary

During training, input windows are extracted from the dataset, and the model learns to predict what comes next.

This next-token prediction is the fundamental task of language modeling.