A new benchmark called MosaicLeaks reveals that AI research agents frequently leak sensitive information from their memory. The study shows agents often struggle to maintain boundaries between private data and user queries. This vulnerability forces developers to rethink how LLMs handle long-term memory and secure context windows in autonomous workflows.