faster-whisper reimplements Whisper inference on top of CTranslate2 (C++/CUDA), delivering 4× speedup over the reference PyTorch impl at the same word error rate. INT8 quantisation halves VRAM again with no measurable accuracy loss. The default Whisper backend for anyone who's measured it.
Pros & cons
Pros
✓4× faster than reference Whisper at equal accuracy
✓INT8 quantisation cuts VRAM in half
✓Drop-in CLI compatible with the reference
Cons
–No diarisation built in — pair with WhisperX or pyannote
–Setup involves CUDA + cuDNN library paths that occasionally fight