Tortoise TTS
Slow, but the quality is worth the wait.
Open Source 4–8 GB VRAM
- Min VRAM
- 4 GB
- GPU class
- Entry GPU
- Quant
- FP16
Actually FreeNo SignupOpen SourceWatermark-Free
Suno's expressive transformer-based TTS.
Runs locally · Entry GPU (6–8 GB)
CPU works but slow (5–10× realtime). 8 GB VRAM comfortable; 12 GB for batch.
Bark is a fully generative text-to-audio model from Suno: not just speech but laughter, sighs, music, and background noises. Wholly different shape from VITS/Tortoise — outputs are creative and unpredictable, which is its charm and its limitation.
MIT-licensed; full weights publicly released.
Slow, but the quality is worth the wait.
Multilingual voice cloning in 6 seconds.
Zero-shot voice cloning TTS — 15 s of audio is enough.