The paper "The Assistive Multi-Armed Bandit" examines preference learning when users lack prior knowledge of their own desires. LessWrong author analyzes how the rationality assumption fails in these scenarios. This retrospective evaluates the technical gaps in early ML research. Practitioners can use these findings to better model human behavior in reinforcement learning systems.