Experiments by Palaestra Research show that a weaker judge better identifies correct answers when two stronger models debate the result. This protocol outperforms one-sided consultancy in code and logic tasks. The finding suggests debate can bridge the capability gap during reward labeling, though the effect remains limited to structured reasoning.