A new exploratory project uses a black box LLM autorater to extract 10-20 specific features from model transcripts. This method splits data into user turns, thoughts, and responses to identify novel behaviors. Researchers at the AI Alignment Forum aim to uncover surprising correlations. The approach helps practitioners better understand target model distributions during RL training.