Skip to content
AI Tools Finder

Tortoise TTS

Slow, but the quality is worth the wait.

Open Source 4–8 GB VRAMRuns locally
Actually FreeNo SignupOpen SourceWatermark-FreeHobbyist-Friendly
Visit Tortoise TTSUpdated 2024-11-02 · Direct link

Hardware requirements

Runs locally · Entry GPU (6–8 GB)

4–8 GB VRAM
Min VRAM
4 GB
Rec. VRAM
8 GB
Min RAM
8 GB
Rec. RAM
16 GB
Disk
5 GB
GPU class
Entry GPU
11.6+No Apple SiliconCPU-CapableQuant: FP16

Workable at 4 GB VRAM; 8 GB recommended. CPU is impractical.

Screenshot placeholder · Tortoise TTS

What is Tortoise TTS?

Tortoise is the high-quality, low-speed TTS that set the bar before XTTS landed. Multi-step diffusion-style generation produces remarkably natural prosody from just a few seconds of reference audio. Now mostly displaced by faster models, but still notable for clone fidelity on a budget.

Pros & cons

Pros

  • Excellent prosody and naturalness vs. its 2022 contemporaries
  • Voice cloning from a few seconds of reference
  • Mature codebase with many community forks

Cons

  • Glacially slow — minutes per sentence on consumer GPUs
  • Newer models (XTTS, F5-TTS) match quality with 10–50× speedup

What's actually free?

Apache 2.0; weights and code both free.

✓ Actually FreeNo SignupOpen SourceWatermark-Free

Alternatives

Bark

Suno's expressive transformer-based TTS.

Open Source 8–12 GB VRAM
Min VRAM
8 GB
GPU class
Entry GPU
Quant
FP16
Actually FreeNo SignupOpen SourceWatermark-Free

XTTS v2 (Coqui)

Multilingual voice cloning in 6 seconds.

Open Source 4–6 GB VRAM
Min VRAM
4 GB
GPU class
Entry GPU
Quant
FP16
Actually FreeNo SignupOpen SourceWatermark-Free

F5-TTS

Zero-shot voice cloning TTS — 15 s of audio is enough.

Open Source 8–12 GB VRAM
Min VRAM
8 GB
GPU class
Mid GPU
Quant
FP16
Actually FreeNo SignupOpen SourceWatermark-Free