Skip to content
AI Tools Finder

RVC (Retrieval-based Voice Conversion)

The voice-changer that took over Discord.

Open Source 4–8 GB VRAMRuns locally
Actually FreeNo SignupOpen SourceWatermark-FreeHobbyist-Friendly
Visit RVC (Retrieval-based Voice Conversion)Updated 2025-10-12 · Direct link

Hardware requirements

Runs locally · Entry GPU (6–8 GB)

4–8 GB VRAM
Min VRAM
4 GB
Rec. VRAM
8 GB
Min RAM
8 GB
Rec. RAM
16 GB
Disk
10 GB
GPU class
Entry GPU
11.7+Apple Silicon ✓CPU-CapableQuant: FP16

Inference on 4 GB; training comfortable at 8 GB. CPU works but slow.

Screenshot placeholder · RVC (Retrieval-based Voice Conversion)

What is RVC (Retrieval-based Voice Conversion)?

RVC takes an existing audio clip and replaces the speaker's voice with a trained target. Different problem than TTS: you need a source recording, but the result preserves all the prosody and emotion of the original take. The standard tool for voice covers, dubbing, and content creation.

Pros & cons

Pros

  • Preserves singing, emotion, accent of the source take
  • Train a new voice in minutes from 10+ minutes of clean audio
  • Massive community model library on HuggingFace / Civitai
  • Gradio web UI ships in-box

Cons

  • Requires a source recording — not a from-scratch TTS
  • Quality depends on clean training data; noisy inputs → robotic output

What's actually free?

MIT-licensed; web UI included.

✓ Actually FreeNo SignupOpen SourceWatermark-Free

Alternatives

XTTS v2 (Coqui)

Multilingual voice cloning in 6 seconds.

Open Source 4–6 GB VRAM
Min VRAM
4 GB
GPU class
Entry GPU
Quant
FP16
Actually FreeNo SignupOpen SourceWatermark-Free

Bark

Suno's expressive transformer-based TTS.

Open Source 8–12 GB VRAM
Min VRAM
8 GB
GPU class
Entry GPU
Quant
FP16
Actually FreeNo SignupOpen SourceWatermark-Free