Debugging GLM-5 at scale revealed critical bottlenecks in serving coding agents. Engineers faced severe latency spikes and memory leaks when managing long-context state across distributed clusters. These findings highlight the gap between model capability and production stability. Practitioners must prioritize efficient state management to prevent system crashes during complex, multi-step autonomous coding tasks.