Debugging GLM-5 at scale revealed critical bottlenecks in serving coding agents. The team encountered significant latency and memory overhead when managing long-context state across distributed nodes. These findings highlight the gap between model capability and production stability. Practitioners must prioritize efficient state management to avoid performance degradation in agentic workflows.