Debugging GLM-5 at scale revealed critical bottlenecks in agent serving infrastructure. Engineers faced severe latency spikes and memory leaks when deploying complex coding workflows. These failures highlight the gap between model performance and production stability. Practitioners must prioritize robust state management and efficient context window handling to prevent systemic crashes during high-concurrency agent tasks.