External evaluators are testing whether public chat logs can mirror private production data to predict model pathologies. Labs typically restrict access to real user conversations for privacy reasons, hiding critical evidence of failure. This approach allows independent researchers to simulate deployment risks. It bypasses the data silos maintained by frontier labs.