A user is testing Claude Opus 4.6 by challenging it to solve Ancient Greek exercises. The experiment aims to detect sycophancy, where the model mirrors user errors rather than providing correct answers. This small-scale test highlights the persistent struggle with honest elicitation. Practitioners should remain skeptical of LLM grading accuracy in niche linguistic domains.