TransformerLens: Finding Induction Heads
Induction heads are the simplest example of a learned algorithm in transformers. Understanding them is the gateway to mechanistic interpretability.
What Are Induction Heads?
Induction heads implement in-context learning:
If the model has seen [A][B] once, and later sees [A], it predicts [B].
"The cat sat on the mat. The cat sat on the ___"
↑
Induction head predicts "mat"
This is pattern completion, learned entirely from training data.
The Induction Circuit
Two heads working together:
- Previous token head (Layer 0): Copies information from position i to position i+1
- Induction head (Layer 1): Searches for past occurrences of the current token
Position: 0 1 2 3 4 5 6 7
Tokens: [A] [B] [C] [A] [?]
│ ↑ │
└───────────┘ │ "I see [A] at pos 3"
│ "Where was [A] before? Pos 0"
│ "What followed [A]? [B] at pos 1"
└───→ Predict [B]
Detecting Induction Heads
Use a repeating sequence:
def make_repeated_tokens(model, seq_len=50):
"""Create tokens like [BOS, A, B, C, D, A, B, C, D, ...]"""
# Random tokens (excluding special tokens)
half = t.randint(1000, 10000, (1, seq_len // 2))
return t.cat([model.to_tokens("")[0], half, half], dim=1)
tokens = make_repeated_tokens(model)
_, cache = model.run_with_cache(tokens)
The Induction Stripe
Induction heads show a distinctive attention pattern:
Key position: 0 1 2 3 4 5 6 7 8 9
Query position:
5 █ ← Attends to position 1
6 █ ← Attends to position 2
7 █ ← Attends to position 3
8 █ ← Attends to position 4
The head attends to the token AFTER the previous occurrence of the current token.
def detect_induction_heads(cache, threshold=0.4):
"""Find heads with strong induction stripe."""
induction_heads = []
for layer in range(model.cfg.n_layers):
pattern = cache["pattern", layer][0] # (n_heads, seq_q, seq_k)
for head in range(model.cfg.n_heads):
# Check for off-diagonal stripe
# At position i in second half, should attend to position (i - seq_len/2 + 1)
score = compute_induction_score(pattern[head])
if score > threshold:
induction_heads.append((layer, head, score))
return induction_heads
Computing the Induction Score
def compute_induction_score(pattern, seq_len):
"""
Measure how much this pattern looks like an induction head.
An induction head at position i attends to position i - seq_len/2 + 1
(the token after the previous occurrence of the current token)
"""
half = seq_len // 2
# For positions in second half, expected key is offset by -half+1
induction_attn = 0
for q in range(half, seq_len):
k = q - half + 1
if 0 <= k < seq_len:
induction_attn += pattern[q, k]
return induction_attn / half
Visualizing the Circuit
import circuitsvis as cv
# Find an induction head
layer, head = 5, 1 # Example
tokens = make_repeated_tokens(model, seq_len=20)
_, cache = model.run_with_cache(tokens)
# Visualize
pattern = cache["pattern", layer][0, head] # (seq, seq)
cv.attention.attention_patterns(
tokens=model.to_str_tokens(tokens[0]),
attention=pattern.unsqueeze(0), # Add head dim back
)
The K-Composition Circuit
How does the induction head know where to look?
The previous token head in layer 0 copies token identity into the next position's residual stream.
The induction head in layer 1 reads this:
- Its Query asks: "What token am I?"
- Its Key asks: "What token came before me?"
When Q and K match, attention is high.
# The QK circuit: what does this head look for?
W_QK = model.W_Q[layer, head] @ model.W_K[layer, head].T
# For induction, this should be close to identity
# (looking for tokens that match the current token)
Capstone Connection
Induction heads and sycophancy:
Induction heads are pattern matchers. They complete patterns from context.
Sycophancy might involve similar circuits:
- "When the user says X, respond Y"
- Pattern:
[user praise] → [agreement] - Pattern:
[user states wrong fact] → [validation]
Your capstone will look for heads that match these sycophantic patterns.
🎓 Tyla's Exercise
Derive why induction heads need at least 2 layers. (Why can't a single layer do pattern matching?)
The previous token head has attention pattern
pattern[i, j]high whenj = i - 1. What does its OV circuit do? (Hint: It copies information forward.)Calculate the number of parameters dedicated to the induction circuit in GPT-2 small.
💻 Aaliyah's Exercise
Find and verify induction heads:
def find_induction_heads(model):
"""
1. Create a repeated sequence
2. Run the model and cache activations
3. Score each head for induction behavior
4. Return heads with score > 0.4
5. Visualize the attention pattern of the strongest one
"""
pass
def verify_induction_behavior(model, layer, head):
"""
Test that this head actually does induction:
1. Create sequence [A, B, C, D, A, ...]
2. Verify the head attends to position after first [A]
3. Ablate this head and measure increase in loss on repeated sequences
"""
pass
📚 Maneesha's Reflection
Induction heads are "discovered" rather than programmed. What does this tell us about what networks learn?
The induction algorithm is simple but emerges from gradient descent on next-token prediction. What other simple algorithms might be hiding in transformers?
If you were teaching someone to find circuits in transformers, would you start with induction heads? Why or why not?