Synthetic data trained a new optical character recognition model capable of processing multiple languages at high speeds. Hugging Face researchers used generated images to overcome the scarcity of diverse, labeled OCR datasets. This approach reduces reliance on manual annotation. Practitioners can now deploy faster text extraction pipelines for non-English documents.