A 7B model from ByteDance Seed answers questions on image-heavy documents more reliably than larger models. It avoids transcription, instead learning to locate specific passages through a question-answering framework. This approach maintains accuracy even when documents are four times longer than the training data. Practitioners can now achieve high-context performance with smaller parameter counts.