Researchers at BAIR developed a method to identify interactions within large language models at scale. The approach integrates feature attribution and mechanistic interpretability to reveal how internal components drive specific predictions. This provides a more transparent view of model decision-making. Practitioners can now better isolate the training examples influencing specific model behaviors.