faster-whisper
Whisper, 4× faster, same accuracy. CTranslate2 backend.
Open Source 2–6 GB VRAM
- Min VRAM
- 2 GB
- GPU class
- Entry GPU
- Quant
- INT8
Actually FreeNo SignupOpen SourceWatermark-Free
The reference open-source speech-to-text model.
Runs locally · Entry GPU (6–8 GB)
`tiny` runs on CPU; `large-v3` needs ~10 GB VRAM.
Whisper is the open-weight transcription model that reset expectations for ASR. Five sizes (tiny → large-v3), 99 languages, robust to noise and accents, runs fully offline. The reference implementation is slow; in practice you'll want faster-whisper, whisper.cpp, or WhisperX. Listed here as the canonical entry.
MIT, weights freely downloadable from HuggingFace.
Whisper, 4× faster, same accuracy. CTranslate2 backend.
Whisper + speaker diarisation + word-level timestamps.