XTTS v2 (Coqui)
Multilingual voice cloning in 6 seconds.
Open Source 4–6 GB VRAM
- Min VRAM
- 4 GB
- GPU class
- Entry GPU
- Quant
- FP16
Actually FreeNo SignupOpen SourceWatermark-Free
The benchmark commercial TTS / voice clone API.
ElevenLabs is the TTS service most production teams pick by default. Voice cloning is best-in-class, the latency is low enough for real-time agents, and Voice Design lets you describe a voice and get it. Closed-source and paid, but the quality bar everyone else is chasing.
10,000 chars/month free; paid plans from $5/mo (Starter) to $330/mo (Pro).
Multilingual voice cloning in 6 seconds.
Zero-shot voice cloning TTS — 15 s of audio is enough.
Suno's expressive transformer-based TTS.