Two LLM modules—an activation verbalizer and reconstructor—jointly map internal activations to text descriptions via reinforcement learning. This unsupervised method produces plausible interpretations of model internals. Researchers used NLAs to audit Claude Opus 4.6 before deployment. The approach allows auditors to diagnose safety-relevant behaviors by reading human-legible explanations of hidden states.