OpenAI models occasionally produce incomprehensible snippets like “parted disclaim marinade” while still delivering correct answers. Researchers from Apollo Research and METR call this illegible reasoning. Current LLM-as-judge strategies fail to detect these patterns reliably. Determining if this behavior is load-bearing will help safety practitioners understand how models actually process complex tasks.