Create complete songs with vocals and accompaniment in seconds using DiffRhythm's breakthrough latent diffusion technology. The first end-to-end solution for full-length song generation, supporting up to 4m45s of high-quality music.
DiffRhythm is the first latent diffusion-based song generation model capable of synthesizing complete songs with both vocals and accompaniment for durations up to 4 minutes and 45 seconds in just ten seconds . Unlike existing models that generate either vocal or accompaniment tracks separately or rely on complex multi-stage architectures, DiffRhythm offers an end-to-end solution with high musicality and intelligibility while maintaining fast inference speeds .