Stable Audio Open
Open-weight text-to-audio — 47-second sound effects and music.
Open Source 6–8 GB VRAM
- Min VRAM
- 6 GB
- GPU class
- Entry GPU
- Quant
- FP16
Actually FreeOpen SourceWatermark-FreeHobbyist-Friendly
Generate synchronized audio for any silent video.
Runs locally · Entry GPU (6–8 GB)
Small/medium variants on 8 GB; large on 12 GB.
MMAudio (Sony AI + research collab) generates synchronized sound effects and ambient audio for silent video clips. Drop in a Wan/Hunyuan/SVD output and get matching footsteps, ambient room tone, splashes — automatically aligned to the frame. Best-in-class for the V2A (video-to-audio) niche.
MIT-licensed; full weights public.
Open-weight text-to-audio — 47-second sound effects and music.
Meta's text-to-music & sound-effect model family.