TransformerLens: Finding Induction Heads

Induction heads are the simplest example of a learned algorithm in transformers. Understanding them is the gateway to mechanistic interpretability.

What Are Induction Heads?

Induction heads implement in-context learning:

If the model has seen [A][B] once, and later sees [A], it predicts [B].

"The cat sat on the mat. The cat sat on the ___"
                                              ↑
                            Induction head predicts "mat"

This is pattern completion, learned entirely from training data.

The Induction Circuit

Two heads working together:

Previous token head (Layer 0): Copies information from position i to position i+1
Induction head (Layer 1): Searches for past occurrences of the current token

Position:    0     1     2     3     4     5     6     7
Tokens:     [A]   [B]   [C]   [A]   [?]
             │           ↑     │
             └───────────┘     │ "I see [A] at pos 3"
                               │ "Where was [A] before? Pos 0"
                               │ "What followed [A]? [B] at pos 1"
                               └───→ Predict [B]

Detecting Induction Heads

Use a repeating sequence:

def make_repeated_tokens(model, seq_len=50):
    """Create tokens like [BOS, A, B, C, D, A, B, C, D, ...]"""
    # Random tokens (excluding special tokens)
    half = t.randint(1000, 10000, (1, seq_len // 2))
    return t.cat([model.to_tokens("")[0], half, half], dim=1)

tokens = make_repeated_tokens(model)
_, cache = model.run_with_cache(tokens)

The Induction Stripe

Induction heads show a distinctive attention pattern:

Key position:    0  1  2  3  4  5  6  7  8  9
Query position:
      5                    █              ← Attends to position 1
      6                       █           ← Attends to position 2
      7                          █        ← Attends to position 3
      8                             █     ← Attends to position 4

The head attends to the token AFTER the previous occurrence of the current token.

def detect_induction_heads(cache, threshold=0.4):
    """Find heads with strong induction stripe."""
    induction_heads = []

    for layer in range(model.cfg.n_layers):
        pattern = cache["pattern", layer][0]  # (n_heads, seq_q, seq_k)

        for head in range(model.cfg.n_heads):
            # Check for off-diagonal stripe
            # At position i in second half, should attend to position (i - seq_len/2 + 1)
            score = compute_induction_score(pattern[head])

            if score > threshold:
                induction_heads.append((layer, head, score))

    return induction_heads

Computing the Induction Score

def compute_induction_score(pattern, seq_len):
    """
    Measure how much this pattern looks like an induction head.

    An induction head at position i attends to position i - seq_len/2 + 1
    (the token after the previous occurrence of the current token)
    """
    half = seq_len // 2

    # For positions in second half, expected key is offset by -half+1
    induction_attn = 0
    for q in range(half, seq_len):
        k = q - half + 1
        if 0 <= k < seq_len:
            induction_attn += pattern[q, k]

    return induction_attn / half

Visualizing the Circuit

import circuitsvis as cv

# Find an induction head
layer, head = 5, 1  # Example

tokens = make_repeated_tokens(model, seq_len=20)
_, cache = model.run_with_cache(tokens)

# Visualize
pattern = cache["pattern", layer][0, head]  # (seq, seq)

cv.attention.attention_patterns(
    tokens=model.to_str_tokens(tokens[0]),
    attention=pattern.unsqueeze(0),  # Add head dim back
)

The K-Composition Circuit

How does the induction head know where to look?

The previous token head in layer 0 copies token identity into the next position's residual stream.

The induction head in layer 1 reads this:

Its Query asks: "What token am I?"
Its Key asks: "What token came before me?"

When Q and K match, attention is high.

# The QK circuit: what does this head look for?
W_QK = model.W_Q[layer, head] @ model.W_K[layer, head].T

# For induction, this should be close to identity
# (looking for tokens that match the current token)

Capstone Connection

Induction heads and sycophancy:

Induction heads are pattern matchers. They complete patterns from context.

Sycophancy might involve similar circuits:

"When the user says X, respond Y"
Pattern: [user praise] → [agreement]
Pattern: [user states wrong fact] → [validation]

Your capstone will look for heads that match these sycophantic patterns.

🎓 Tyla's Exercise

Derive why induction heads need at least 2 layers. (Why can't a single layer do pattern matching?)
The previous token head has attention pattern pattern[i, j] high when j = i - 1. What does its OV circuit do? (Hint: It copies information forward.)
Calculate the number of parameters dedicated to the induction circuit in GPT-2 small.

💻 Aaliyah's Exercise

Find and verify induction heads:

def find_induction_heads(model):
    """
    1. Create a repeated sequence
    2. Run the model and cache activations
    3. Score each head for induction behavior
    4. Return heads with score > 0.4
    5. Visualize the attention pattern of the strongest one
    """
    pass

def verify_induction_behavior(model, layer, head):
    """
    Test that this head actually does induction:
    1. Create sequence [A, B, C, D, A, ...]
    2. Verify the head attends to position after first [A]
    3. Ablate this head and measure increase in loss on repeated sequences
    """
    pass

📚 Maneesha's Reflection

Induction heads are "discovered" rather than programmed. What does this tell us about what networks learn?
The induction algorithm is simple but emerges from gradient descent on next-token prediction. What other simple algorithms might be hiding in transformers?
If you were teaching someone to find circuits in transformers, would you start with induction heads? Why or why not?

TransformerLens: Finding Induction Heads #

What Are Induction Heads? #

The Induction Circuit #

Detecting Induction Heads #

The Induction Stripe #

Computing the Induction Score #

Visualizing the Circuit #

The K-Composition Circuit #

Capstone Connection #

🎓 Tyla's Exercise #

💻 Aaliyah's Exercise #

📚 Maneesha's Reflection #