The QIMMA leaderboard evaluates Arabic LLMs using a quality-first approach to combat data contamination. It prioritizes rigorous human evaluation and specialized benchmarks over simple automated metrics. This framework exposes the gap between claimed and actual performance in Arabic linguistic nuances. Developers now have a reliable baseline to refine Arabic language models for regional deployment.