Natural Language Autoencoders use two LLM modules to map internal activations to text descriptions and back. Researchers trained this system via reinforcement learning to reconstruct residual stream activations. During a pre-deployment audit of Claude Opus 4.6, the tool diagnosed safety-relevant behaviors. This unsupervised method provides interpretability practitioners a concrete way to audit model internals.