The DiScoFormer architecture uses a single transformer to estimate both probability density and score functions across diverse distributions. This approach removes the need for separate models for different tasks. Researchers at Hugging Face demonstrate that this unification simplifies training. Practitioners can now deploy one model for multiple generative modeling objectives.