A new benchmark tests how frontier ASR models handle code-switching, where speakers mix languages mid-sentence. Results show that while top models excel at monolingual speech, they struggle with rapid language transitions. This gap limits the reliability of voice agents for bilingual users. Developers must now prioritize mixed-language datasets to improve accuracy.