Skip to content
AI Tools Finder

Bark

Suno's expressive transformer-based TTS.

Open Source 8–12 GB VRAMRuns locally
Actually FreeNo SignupOpen SourceWatermark-FreeHobbyist-FriendlyAPI
Visit BarkUpdated 2025-08-14 · Direct link

Hardware requirements

Runs locally · Entry GPU (6–8 GB)

8–12 GB VRAM
Min VRAM
8 GB
Rec. VRAM
12 GB
Min RAM
16 GB
Rec. RAM
16 GB
Disk
8 GB
GPU class
Entry GPU
11.7+Apple Silicon ✓CPU-CapableQuant: FP16

CPU works but slow (5–10× realtime). 8 GB VRAM comfortable; 12 GB for batch.

Screenshot placeholder · Bark

What is Bark?

Bark is a fully generative text-to-audio model from Suno: not just speech but laughter, sighs, music, and background noises. Wholly different shape from VITS/Tortoise — outputs are creative and unpredictable, which is its charm and its limitation.

Pros & cons

Pros

  • Genuinely expressive — emotion, laughter, ambient sounds in one model
  • Supports 100+ languages out of the box
  • Multi-speaker presets for consistent voices

Cons

  • Non-deterministic — same prompt can produce very different takes
  • Can hallucinate words or skip text on longer inputs
  • Slower than VITS-based TTS at the same quality tier

What's actually free?

MIT-licensed; full weights publicly released.

✓ Actually FreeNo SignupOpen SourceWatermark-Free

Alternatives

Tortoise TTS

Slow, but the quality is worth the wait.

Open Source 4–8 GB VRAM
Min VRAM
4 GB
GPU class
Entry GPU
Quant
FP16
Actually FreeNo SignupOpen SourceWatermark-Free

XTTS v2 (Coqui)

Multilingual voice cloning in 6 seconds.

Open Source 4–6 GB VRAM
Min VRAM
4 GB
GPU class
Entry GPU
Quant
FP16
Actually FreeNo SignupOpen SourceWatermark-Free

F5-TTS

Zero-shot voice cloning TTS — 15 s of audio is enough.

Open Source 8–12 GB VRAM
Min VRAM
8 GB
GPU class
Mid GPU
Quant
FP16
Actually FreeNo SignupOpen SourceWatermark-Free