XTTS v2 (Coqui)
Multilingual voice cloning in 6 seconds.
Open Source 4–6 GB VRAM
- Min VRAM
- 4 GB
- GPU class
- Entry GPU
- Quant
- FP16
Actually FreeNo SignupOpen SourceWatermark-Free
The voice-changer that took over Discord.
Runs locally · Entry GPU (6–8 GB)
Inference on 4 GB; training comfortable at 8 GB. CPU works but slow.
RVC takes an existing audio clip and replaces the speaker's voice with a trained target. Different problem than TTS: you need a source recording, but the result preserves all the prosody and emotion of the original take. The standard tool for voice covers, dubbing, and content creation.
MIT-licensed; web UI included.
Multilingual voice cloning in 6 seconds.
Suno's expressive transformer-based TTS.