Debugging GLM-5 at scale revealed critical bottlenecks in serving coding agents. The team encountered significant latency and memory overhead when managing long-context state across distributed environments. These findings highlight the gap between model capability and production-grade agent infrastructure. Practitioners must prioritize state management over raw model size for reliable agentic workflows.