A 7B model from ByteDance Seed handles image-heavy documents four times longer than its training data. The system replaces standard transcription with a question-answering approach to locate relevant passages. This method improves reliability over larger models. Practitioners can now achieve higher accuracy in long-context retrieval using significantly smaller parameter counts.