Two LLM modules—an activation verbalizer and reconstructor—now map internal activations to text descriptions via reinforcement learning. This unsupervised method, called Natural Language Autoencoders, translates complex residual stream data into human-readable interpretations. Researchers used the tool to audit Claude Opus 4.6. It provides a concrete path for diagnosing safety-relevant behaviors before deployment.