A 3.8 billion parameter model called Lens matches larger rivals by using 800 million detailed captions generated by GPT-4.1. This approach replaces vague web alt-text with precise descriptions to lower training costs. Microsoft released the code and weights under an open-source license. Practitioners can now prioritize caption quality over raw parameter scale.