A new benchmark reveals that frontier ASR models struggle with code-switching, where speakers mix two languages in one sentence. Researchers at Hugging Face tested these systems against complex bilingual datasets. The results show significant performance drops compared to monolingual speech. Developers must now prioritize diverse linguistic training to build reliable voice agents.