The DiScoFormer architecture uses a single transformer to estimate both probability densities and score functions. This approach eliminates the need for separate models when handling diverse data distributions. Researchers at Hugging Face demonstrate that this unification improves sampling efficiency. Practitioners can now deploy one model for multiple generative tasks.