Bark
Suno's expressive transformer-based TTS.
Open Source 8–12 GB VRAM
- Min VRAM
- 8 GB
- GPU class
- Entry GPU
- Quant
- FP16
Actually FreeNo SignupOpen SourceWatermark-Free
Multilingual voice cloning in 6 seconds.
Runs locally · Entry GPU (6–8 GB)
Real-time on 4 GB+. Apple Silicon MPS works.
Coqui's XTTS v2 is the production TTS workhorse: clone a voice from 6 seconds of audio, generate speech in 17 languages, run on a 4 GB GPU. Coqui the company is gone but the model lives on under a permissive licence, and it's the backbone of most current OSS voice apps.
Coqui Public Model Licence — free for personal & commercial use.
Suno's expressive transformer-based TTS.
Slow, but the quality is worth the wait.
Zero-shot voice cloning TTS — 15 s of audio is enough.