Two LLM modules—an activation verbalizer and a reconstructor—form the Natural Language Autoencoder (NLA) to map internal activations to text. Trained via reinforcement learning, the system produces human-readable interpretations of model internals. Researchers used this method to audit Claude Opus 4.6, providing a concrete tool for diagnosing safety-relevant behaviors before deployment.