Skip to content

112: Positional Encoding

Chapter Overview

The [[111-Self-Attention-Mechanism|Self-Attention]] mechanism is powerful, but it has a fundamental weakness: it is permutation-invariant. This means it treats the input "the cat sat on the mat" and "the mat sat on the cat" as identical because it has no inherent sense of word order.

Positional Encoding is the clever solution to this problem. It injects information about the position of each token directly into its embedding.


The Core Concept

Before the input embeddings are fed into the first Transformer layer, a positional encoding vector is added to each token's embedding.

This encoding vector is not learned; it is a fixed vector generated by a mathematical formula. This ensures that the model receives a unique and consistent signal for each position in the sequence.

flowchart TD
    subgraph one ["Step 1: Start with the Word"]
        A["Token: 'cat'"] --> B["Word Embedding<br/>[0.2, -0.1, 0.5, ...]"]
    end

    subgraph two ["Step 2: Generate Positional Information"]
        C["Position in Sequence: 5"] --> D["Positional Encoding Vector<br/>(Generated via sine/cosine formula)<br/>[-0.96, 0.28, 0.76, ...]"]
    end

    subgraph three ["Step 3: Combine them"]
        B -->|"Content"| E["➕<br/>Element-wise<br/>Addition"]
        D -->|"Position"| E
    end

    subgraph four ["Step 4: Final Input for Transformer"]
       E --> F["Final Input Vector<br/>(Now contains both content and position info)<br/>[-0.76, 0.18, 1.26, ...]"]
    end

    style A fill:#e3f2fd,stroke:#1976d2
    style C fill:#e3f2fd,stroke:#1976d2
    style B fill:#fff3e0,stroke:#f57c00
    style D fill:#fff3e0,stroke:#f57c00
    style E fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    style F fill:#c8e6c9,stroke:#1B5E20,stroke-width:2px

Next Steps

With content and position now combined into a single vector, the input is ready for the main processing layers. Let's explore how the Transformer handles multiple attention calculations in parallel.

🧠 Multi-Head Attention →

← Self-Attention Mechanism