A new framework from Google AI Research measures how closely generative user simulators mimic actual human behavior. The team identifies specific gaps where synthetic agents fail to replicate complex user intent. By bridging these discrepancies, developers can test LLM applications against more accurate simulations before deploying to real users.