Debugging GLM-5 at scale reveals critical bottlenecks in agentic serving infrastructure. Engineers struggled with memory overhead and latency spikes during complex code generation tasks. These findings highlight the gap between model capability and production reliability. Practitioners must prioritize efficient state management to prevent system crashes during long-context agentic workflows.