Debugging GLM-5 at scale revealed critical bottlenecks in serving coding agents. The team struggled with high latency and state management during complex autonomous tasks. These findings highlight the gap between model capability and production infrastructure. Engineers must now optimize inference pipelines to support long-context agentic workflows without sacrificing speed or reliability.