Two LLM modules—an activation verbalizer and reconstructor—now map internal activations to human-readable text. This unsupervised method uses reinforcement learning to reconstruct residual stream activations. Researchers applied the tool to audit Claude Opus 4.6 before deployment. This approach allows Anthropic to diagnose safety-relevant behaviors by interpreting model internals as plausible natural language.