DATASHEET // XTTS-V2

XTTS v2 (Coqui)

Multilingual voice cloning in 6 seconds.

OPEN SOURCE4–6 GB VRAMRuns locally

Actually FreeNo SignupOpen SourceWatermark-FreeHobbyist-OKAPI

Visit XTTS v2 (Coqui)UPDATED 2025-09-30 · DIRECT LINK

github.com/coqui-ai/TTS

HARDWARE REQUIREMENTS //

Runs locally · Entry GPU (6–8 GB)

4–6 GB VRAM

Min VRAM

4 GB

Rec. VRAM

6 GB

Min RAM

8 GB

Rec. RAM

16 GB

Disk

4 GB

GPU class

Entry GPU

11.7+Apple Silicon ✓CPU-CapableQuant: FP16

Real-time on 4 GB+. Apple Silicon MPS works.

[ EDITORIAL PICK ]

Why we recommend XTTS v2 (Coqui)

DERIVED FROM METADATA — NOT SPONSORED

Open source
Source is public — you can audit it, fork it, and you'll never lose access to your workflows if XTTS v2 (Coqui) the company changes direction.
Runs on 4 GB
Fits on entry-level cards (GTX 1660, RTX 3050, RTX 4060). Rare for this category.
Apple Silicon
Native Metal / MPS support — runs on M-series Macs without CUDA gymnastics.
Beginner-friendly
You don't need to read a paper before getting your first result — sensible defaults and a quick install.

[ EVIDENCE NOTE ]

Documentation-led datasheet

This page summarizes upstream documentation, release information, and editorially reviewed catalogue fields. It is not presented as a hands-on benchmark. Verify changing requirements at the official project; report stale data through our corrections channel.

AT-A-GLANCE SIGNALS //

DERIVED FROM THIS PAGE'S DATA

Install difficulty
Easy
Runs CPU-only — no CUDA / driver gymnastics required.
Hardware comfort
Entry-level
Fits on 4 GB cards — GTX 1660 / RTX 3050 territory.
Ecosystem
Strong devkit
Open-source AND ships an API — easy to integrate, possible to host yourself.
Verification
Stale
289 days since the last refresh — treat hardware numbers as a floor, not a ceiling.

[ COMMUNITY GUIDES & WORKFLOWS ]

Tutorials & deep-dives for XTTS v2 (Coqui)

Hand-picked from YouTube, Reddit, GitHub, and the wider web. Each link goes straight to the source — we don't intercept or rewrite anything.

[ MORE IN THIS NICHE ]

Other audio & speech generation tools we rate

Three picks across different tradeoffs — so you don't end up with three near-clones of XTTS v2 (Coqui).

LIGHTEST HARDWARE //

RVC (Retrieval-based Voice Conversion)

The voice-changer that took over Discord.

OPEN SOURCE4–8 GB VRAM

BEST FREE OPTION //

MMAudio

Generate synchronized audio for any silent video.

OPEN SOURCE8–12 GB VRAM

TOP QUALITY //

ElevenLabs

The benchmark commercial TTS / voice clone API.

FREEMIUM · $5/MOCLOUD · NO GPU

What is XTTS v2 (Coqui)?

Coqui's XTTS v2 is the production TTS workhorse: clone a voice from 6 seconds of audio, generate speech in 17 languages, run on a 4 GB GPU. Coqui the company is gone but the model lives on under a permissive licence, and it's the backbone of most current OSS voice apps.

Pros & cons

✓ PROS

6-second voice cloning that actually works
17 languages including cross-lingual cloning
Real-time on a single mid-range GPU
Used as the engine inside many higher-level apps

– CONS

Coqui (the company) shut down — community-maintained from here
Licence is permissive but not OSI-approved; check before commercial use

What's actually free?

Coqui Public Model Licence — free for personal & commercial use.

✓ Actually FreeNo SignupOpen SourceWatermark-Free

Alternatives

Bark

Suno's expressive transformer-based TTS.

OPEN SOURCE8–12 GB VRAM

VRAM fit8–12 GB

Tortoise TTS

Slow, but the quality is worth the wait.

OPEN SOURCE4–8 GB VRAM

VRAM fit4–8 GB

F5-TTS

Zero-shot voice cloning TTS — 15 s of audio is enough.

OPEN SOURCE8–12 GB VRAM

VRAM fit8–12 GB