Skip to content
AI Tools Finder

OpenAI Whisper

The reference open-source speech-to-text model.

Open Source 2–10 GB VRAMRuns locally
Actually FreeNo SignupOpen SourceWatermark-FreeHobbyist-FriendlyAPI
Visit OpenAI WhisperUpdated 2025-11-15 · Direct link

Hardware requirements

Runs locally · Entry GPU (6–8 GB)

2–10 GB VRAM
Min VRAM
2 GB
Rec. VRAM
10 GB
Min RAM
8 GB
Rec. RAM
16 GB
Disk
10 GB
GPU class
Entry GPU
11.3+Apple Silicon ✓CPU-CapableQuant: FP16

`tiny` runs on CPU; `large-v3` needs ~10 GB VRAM.

Screenshot placeholder · OpenAI Whisper

What is OpenAI Whisper?

Whisper is the open-weight transcription model that reset expectations for ASR. Five sizes (tiny → large-v3), 99 languages, robust to noise and accents, runs fully offline. The reference implementation is slow; in practice you'll want faster-whisper, whisper.cpp, or WhisperX. Listed here as the canonical entry.

Pros & cons

Pros

  • Multilingual ASR that genuinely works across 99 languages
  • Five size points let you trade speed for accuracy
  • Active fork ecosystem (faster-whisper, WhisperX, whisper.cpp)

Cons

  • Reference impl in PyTorch is 4-6× slower than the optimised forks
  • Hallucinates plausible text during long silences

What's actually free?

MIT, weights freely downloadable from HuggingFace.

✓ Actually FreeNo SignupOpen SourceWatermark-Free

Alternatives

faster-whisper

Whisper, 4× faster, same accuracy. CTranslate2 backend.

Open Source 2–6 GB VRAM
Min VRAM
2 GB
GPU class
Entry GPU
Quant
INT8
Actually FreeNo SignupOpen SourceWatermark-Free

WhisperX

Whisper + speaker diarisation + word-level timestamps.

Open Source 4–8 GB VRAM
Min VRAM
4 GB
GPU class
Entry GPU
Quant
INT8
Actually FreeOpen SourceWatermark-FreeHobbyist-Friendly