Bark
Suno's expressive transformer-based TTS.
Open Source 8–12 GB VRAM
- Min VRAM
- 8 GB
- GPU class
- Entry GPU
- Quant
- FP16
Actually FreeNo SignupOpen SourceWatermark-Free
Slow, but the quality is worth the wait.
Runs locally · Entry GPU (6–8 GB)
Workable at 4 GB VRAM; 8 GB recommended. CPU is impractical.
Tortoise is the high-quality, low-speed TTS that set the bar before XTTS landed. Multi-step diffusion-style generation produces remarkably natural prosody from just a few seconds of reference audio. Now mostly displaced by faster models, but still notable for clone fidelity on a budget.
Apache 2.0; weights and code both free.
Suno's expressive transformer-based TTS.
Multilingual voice cloning in 6 seconds.
Zero-shot voice cloning TTS — 15 s of audio is enough.