Two LLM modules—an activation verbalizer and a reconstructor—form the Natural Language Autoencoder to map internal activations to text. Trained via reinforcement learning, this method generates human-readable interpretations of model internals. Researchers used the tool to audit Claude Opus 4.6. This provides a concrete path for diagnosing safety-relevant behaviors before model deployment.