Anthropic reports a counterintuitive safety technique: activating “persona vectors” for sycophancy/evil during training can reduce those behaviors later without hurting performance, hinting at scalable alignment methods beyond post-training steering.