A new benchmark tests how frontier ASR models handle code-switching, where speakers mix two languages in one sentence. Hugging Face researchers found that most models struggle with these transitions compared to monolingual speech. This gap limits the reliability of voice agents for bilingual users. Developers must now prioritize mixed-language training data.