Foundations • AI Engineering Compendium

Part I: The Foundations of Modern LLMs

Transformer basics, tokenization, embeddings, and the attention mechanism.

Parallel self-attention enables large-scale training and long-range dependencies.

Subword units balance vocabulary size with the ability to handle rare words.

Tokens are mapped to dense vectors and enriched with positional information.

Click on Q, K, or V to see each role in self-attention.

The current token's question about context.

An index of what each token offers.

Similarity of Q to K determines relevance.

The content that is aggregated.

Self-attention compares each token's Query to all Keys to create weights over Values, yielding context-aware representations.