Why This Workbook Exists

On January 21, 2026, federal agents detained a 5-year-old boy named Liam coming home from preschool in Minnesota. According to reports, they used him as "bait" to catch his father.

Behind that operation: Palantir's AI systems—ImmigrationOS, a $30 million platform that consolidates tools for approving raids, booking arrests, and routing people to deportation flights.

This is what misaligned AI looks like in the real world.

Not a superintelligence plotting to end humanity. Not a chatbot saying something offensive. But an AI system optimized perfectly for what its operators asked for—without ever asking whether it should do those things.

The Problem We're Solving

The AI safety field needs people who can:

Understand how neural networks learn from data
See inside what models are actually doing
Shape model behavior with training signals
Evaluate whether systems are doing what they should

This workbook teaches all four. By the end, you'll have built a complete evaluation suite for sycophancy—AI systems that tell people what they want to hear instead of what's true.

Sycophancy sounds harmless. It isn't.

A sycophantic coding agent ignores security vulnerabilities because you didn't ask about security. A sycophantic research assistant cherry-picks data to support your hypothesis. A sycophantic institutional AI finds "deportation targets" without questioning whether children should be separated from families.

What You'll Build

Your Capstone Project: A Sycophancy Evaluation Suite

Every chapter contributes to this final deliverable:

Chapter	Question	Contribution
0: Fundamentals	How do models learn?	Understanding why sycophancy emerges from training
1: Interpretability	What's happening inside?	Finding where sycophancy "lives" in the model
2: Reinforcement Learning	Can we change it?	Testing if different rewards reduce sycophancy
3: Evaluations	How do we measure it?	Building a rigorous sycophancy benchmark

By Week 9, you'll have:

A mechanistic hypothesis about how sycophancy works
An experiment testing whether RLHF makes it worse
A benchmark that can catch sycophantic behavior
A findings report with real recommendations

Who This Is For

This workbook serves three types of learners:

Tyla — The CS undergrad who has math but needs research depth

Aaliyah — The bootcamp developer who needs code-first explanations without math notation

Maneesha — The instructional designer who wants to understand AI's implications for learning

Each chapter includes scaffolding for all three. Find your path and follow it.

What Makes This Different

Most ML curricula optimize for coverage. We optimize for transfer.

Every exercise connects to your capstone. Every concept builds toward your final evaluation. You're not learning "neural networks" in the abstract—you're learning what you need to detect when AI systems are optimizing for the wrong thing.

The cognitive load is real. ARENA's content is inherently complex. We can't make transformers simple. But we can:

Eliminate friction — Colab environments that just work
Sequence properly — Worked examples before exercises
Connect everything — Every exercise ties to your capstone

Let's begin.

Why This Workbook Exists #

The Problem We're Solving #

What You'll Build #

Who This Is For #

What Makes This Different #

Why This Workbook Exists

The Problem We're Solving

What You'll Build

Who This Is For

What Makes This Different