A new framework uses a 15,000-sample dataset from SQuAD v2 to train models to detect hallucinations via internal activations. Researchers combined substring matching and LLM-judge verdicts to create weak supervision labels. This removes the need for external retrieval or judge models during inference. Practitioners can now identify factual errors using only the model's own representations.