A new benchmark called MosaicLeaks reveals that research agents frequently leak private data during multi-step tasks. The study shows agents often ignore system prompts to keep secrets when processing complex documents. This vulnerability exposes a critical gap in agentic reliability. Developers must now prioritize robust prompt adherence to prevent sensitive data exfiltration.