DATASHEET // F5-TTS

F5-TTS

Zero-shot voice cloning TTS — 15 s of audio is enough.

OPEN SOURCE8–12 GB VRAMRuns locally

Actually FreeNo SignupOpen SourceWatermark-FreeHobbyist-OK

Visit F5-TTSUPDATED 2026-04-18 · DIRECT LINK

github.com/SWivid/F5-TTS

HARDWARE REQUIREMENTS //

Runs locally · Mid GPU (12 GB)

8–12 GB VRAM

Min VRAM

8 GB

Rec. VRAM

12 GB

Min RAM

16 GB

Rec. RAM

32 GB

Disk

15 GB

GPU class

Mid GPU

11.8+Apple Silicon ✓GPU RequiredQuant: FP16

8 GB VRAM viable with shorter contexts; 12 GB for full sequence lengths.

[ EDITORIAL PICK ]

Why we recommend F5-TTS

DERIVED FROM METADATA — NOT SPONSORED

Open source
Source is public — you can audit it, fork it, and you'll never lose access to your workflows if F5-TTS the company changes direction.
Runs on 8 GB
Comfortable on a mid-range consumer card — no need to remortgage for an A100.
Apple Silicon
Native Metal / MPS support — runs on M-series Macs without CUDA gymnastics.
Top-tier pick
Power-user score 86/100 — consistently rated highly by people who use this every day, not just benchmark chasers.

[ EVIDENCE NOTE ]

Documentation-led datasheet

This page summarizes upstream documentation, release information, and editorially reviewed catalogue fields. It is not presented as a hands-on benchmark. Verify changing requirements at the official project; report stale data through our corrections channel.

Memory guide →

AT-A-GLANCE SIGNALS //

DERIVED FROM THIS PAGE'S DATA

Install difficulty
Standard
A standard local install — download, install dependencies, point at your GPU.
Hardware comfort
Mainstream
Needs 8 GB minimum — RTX 3060 12GB or 4070 territory.
Ecosystem
Active community
Open source plus 3 community resources we've vetted — there are people to ask.
Verification
Recent
Catalogue entry last updated 89 days ago — re-verification due soon.

[ COMMUNITY GUIDES & WORKFLOWS ]

Tutorials & deep-dives for F5-TTS

Hand-picked from YouTube, Reddit, GitHub, and the wider web. Each link goes straight to the source — we don't intercept or rewrite anything.

[ MORE IN THIS NICHE ]

Other local llm runners tools we rate

Three picks across different tradeoffs — so you don't end up with three near-clones of F5-TTS.

LIGHTEST HARDWARE //

Transformers

The library every LLM ships against first.

OPEN SOURCE2–24 GB VRAM

BEST FREE OPTION //

llama.cpp

The C++ inference engine powering most local LLMs.

OPEN SOURCECPU-CAPABLE

TOP QUALITY //

Ollama

One-command local LLM runtime.

OPEN SOURCECPU-CAPABLE

What is F5-TTS?

F5-TTS is the current state-of-the-art open-weight zero-shot text-to-speech model. Give it a 15-second voice sample plus target text and it produces natural-sounding speech in that voice. Trained on 100k hours of multilingual audio; runs on a single 12 GB GPU.

Pros & cons

✓ PROS

Zero-shot cloning genuinely works on 15 s of clean audio
Naturalness comparable to closed commercial TTS
Active research lab maintenance (SWivid)

– CONS

Non-commercial license — not for paid products
English / Chinese strongest; other languages weaker

What's actually free?

CC-BY-NC-4.0 — free for non-commercial use.

✓ Actually FreeNo SignupOpen SourceWatermark-Free

Alternatives

XTTS v2 (Coqui)

Multilingual voice cloning in 6 seconds.

OPEN SOURCE4–6 GB VRAM

VRAM fit4–6 GB

Bark

Suno's expressive transformer-based TTS.

OPEN SOURCE8–12 GB VRAM

VRAM fit8–12 GB

Tortoise TTS

Slow, but the quality is worth the wait.

OPEN SOURCE4–8 GB VRAM

VRAM fit4–8 GB