MNIST experiments using asymmetric debate recovered 95% gold accuracy, beating a 90% consultancy baseline. The researcher suggests combining quantilizers with interpretability monitoring to solve alignment under tight timelines. This approach uses debate for optimization pressure rather than as a training signal. It offers a technical path for evaluating super-human systems.