Developers on Lobste.rs are benchmarking how AI agents parse complex documentation. These tests reveal consistent failures in long-context retrieval and logical sequencing. The results suggest that current LLM agents struggle with deep reading tasks despite larger windows. Practitioners should prioritize RAG optimization over relying on raw context for agentic workflows.