A 3.8 billion parameter model called Lens matches larger image generators by using 800 million detailed captions generated by GPT-4. This approach replaces vague web alt-text to reduce training costs. The weights and code are now open-source. Practitioners can prioritize caption quality over raw parameter scale to achieve high-fidelity output.