A new framework uses 15,000 samples from SQuAD v2 to train models to detect hallucinations via internal activations. Researchers combined substring matching, embedding similarity, and LLM judges to create weak supervision labels. This removes the need for external verification during inference. Practitioners can now identify factual errors by analyzing the model's own latent representations.