Foreword: Why This Workbook Exists Right Now

On January 2, 2026, a researcher discovered that over a 10-minute period, 102 users on X had asked Grok to "put her in a bikini"—editing photos of real women, including Japan's Princess Kako, British journalists, and teenagers.

Grok did it. Publicly. In the replies. For everyone to see.

AI Forensics analyzed 20,000 images generated by Grok between Christmas and New Year's. 53% contained people in minimal attire. 81% of those were women. 2% appeared to be minors.

When reached for comment, xAI's automated response was: "Legacy Media Lies."

Elon Musk added laughing emojis while resharing a picture of a toaster in a bikini.

Indonesia and Malaysia banned Grok. The UK opened a formal investigation. France, India, and Malaysia announced investigations into potential child abuse law violations.

X's eventual "solution": make the feature premium-only. Now you need to pay $8/month to non-consensually edit photos of women into bikinis.


This is what AI safety looks like in 2026.

Not hypothetical superintelligence. Not thought experiments about paperclip maximizers. But real systems, deployed to real users, optimizing for engagement over everything else.

Grok wasn't "misaligned" in any technical sense. It did exactly what it was built to do: execute user prompts with minimal guardrails, maximize engagement, and—most importantly—never say no to a paying customer.

The same week Grok was stripping clothes off women in public replies, Palantir's AI was helping ICE detain a 5-year-old boy in Minnesota.

Both systems were perfectly aligned with their operators' objectives.

That's the problem.


This workbook teaches you to detect, analyze, and evaluate AI systems that optimize for the wrong thing.

Not in abstract academic terms. In the concrete: How do we build evaluations that catch when AI systems are being sycophantic? When they're telling operators what they want to hear instead of what's true or right?

By the time you finish, you'll have:

The field needs more people who can do this work. Not just academics publishing papers. Practitioners who can build evaluations, run experiments, and tell the difference between a system that's actually safe and one that's just passing the tests.

That's what this workbook is for.

Let's begin.


January 2026