OpenAI models occasionally produce incomprehensible snippets like “parted disclaim marinade” while still delivering correct answers. Researchers at Apollo Research and METR are investigating if these fragments are load-bearing for task performance. Current LLM-as-judge strategies fail to detect this behavior. Understanding these hidden logic patterns is critical for improving model interpretability and safety.