Training language models on texts explaining intended values before teaching specific behaviors increases adherence to those rules. Researchers from the Anthropic Fellows Program found this approach works even in novel situations. This suggests that conceptual understanding outperforms rote behavioral training. Practitioners can now prioritize value-based context to reduce model misalignment.