Why This Workbook Exists

On January 21, 2026, federal agents detained a 5-year-old boy named Liam coming home from preschool in Minnesota. According to reports, they used him as "bait" to catch his father.

Behind that operation: Palantir's AI systems—ImmigrationOS, a $30 million platform that consolidates tools for approving raids, booking arrests, and routing people to deportation flights.

This is what misaligned AI looks like in the real world.

Not a superintelligence plotting to end humanity. Not a chatbot saying something offensive. But an AI system optimized perfectly for what its operators asked for—without ever asking whether it should do those things.


The Problem We're Solving

The AI safety field needs people who can:

  1. Understand how neural networks learn from data
  2. See inside what models are actually doing
  3. Shape model behavior with training signals
  4. Evaluate whether systems are doing what they should

This workbook teaches all four. By the end, you'll have built a complete evaluation suite for sycophancy—AI systems that tell people what they want to hear instead of what's true.

Sycophancy sounds harmless. It isn't.

A sycophantic coding agent ignores security vulnerabilities because you didn't ask about security. A sycophantic research assistant cherry-picks data to support your hypothesis. A sycophantic institutional AI finds "deportation targets" without questioning whether children should be separated from families.


What You'll Build

Your Capstone Project: A Sycophancy Evaluation Suite

Every chapter contributes to this final deliverable:

Chapter Question Contribution
0: Fundamentals How do models learn? Understanding why sycophancy emerges from training
1: Interpretability What's happening inside? Finding where sycophancy "lives" in the model
2: Reinforcement Learning Can we change it? Testing if different rewards reduce sycophancy
3: Evaluations How do we measure it? Building a rigorous sycophancy benchmark

By Week 9, you'll have:


Who This Is For

This workbook serves three types of learners:

Tyla — The CS undergrad who has math but needs research depth

Aaliyah — The bootcamp developer who needs code-first explanations without math notation

Maneesha — The instructional designer who wants to understand AI's implications for learning

Each chapter includes scaffolding for all three. Find your path and follow it.


What Makes This Different

Most ML curricula optimize for coverage. We optimize for transfer.

Every exercise connects to your capstone. Every concept builds toward your final evaluation. You're not learning "neural networks" in the abstract—you're learning what you need to detect when AI systems are optimizing for the wrong thing.

The cognitive load is real. ARENA's content is inherently complex. We can't make transformers simple. But we can:

  1. Eliminate friction — Colab environments that just work
  2. Sequence properly — Worked examples before exercises
  3. Connect everything — Every exercise ties to your capstone

Let's begin.