Natural Language Autoencoders (NLAs) use two LLM modules to map internal activations to text descriptions and back. Trained via reinforcement learning, these tools produce human-readable interpretations of model internals. Researchers used them to audit Claude Opus 4.6 for safety risks. This provides a concrete path for auditing black-box models before deployment.