DATASHEET // WHISPERX

WhisperX

Whisper + speaker diarisation + word-level timestamps.

OPEN SOURCE4–8 GB VRAMRuns locally

Actually FreeOpen SourceWatermark-FreeHobbyist-OK

Visit WhisperXUPDATED 2026-03-12 · DIRECT LINK

github.com/m-bain/whisperX

HARDWARE REQUIREMENTS //

Runs locally · Entry GPU (6–8 GB)

4–8 GB VRAM

Min VRAM

4 GB

Rec. VRAM

8 GB

Min RAM

16 GB

Rec. RAM

16 GB

Disk

12 GB

GPU class

Entry GPU

11.8+No Apple SiliconGPU RequiredQuant: INT8, FP16

Diarisation model adds ~2 GB VRAM on top of Whisper.

[ EDITORIAL PICK ]

Why we recommend WhisperX

DERIVED FROM METADATA — NOT SPONSORED

Open source
Source is public — you can audit it, fork it, and you'll never lose access to your workflows if WhisperX the company changes direction.
Runs on 4 GB
Fits on entry-level cards (GTX 1660, RTX 3050, RTX 4060). Rare for this category.
2 quant formats
Supports INT8, FP16 — you can dial VRAM use up or down to match your card.
Beginner-friendly
You don't need to read a paper before getting your first result — sensible defaults and a quick install.

[ EVIDENCE NOTE ]

Documentation-led datasheet

This page summarizes upstream documentation, release information, and editorially reviewed catalogue fields. It is not presented as a hands-on benchmark. Verify changing requirements at the official project; report stale data through our corrections channel.

Memory guide →

AT-A-GLANCE SIGNALS //

DERIVED FROM THIS PAGE'S DATA

Install difficulty
Standard
A standard local install — download, install dependencies, point at your GPU.
Hardware comfort
Entry-level
Fits on 4 GB cards — GTX 1660 / RTX 3050 territory.
Ecosystem
Open source
Source is public — auditable and forkable, no vendor lock.
Verification
Ageing
126 days since the last catalogue refresh — flagged for re-verification.

[ COMMUNITY GUIDES & WORKFLOWS ]

Tutorials & deep-dives for WhisperX

Hand-picked from YouTube, Reddit, GitHub, and the wider web. Each link goes straight to the source — we don't intercept or rewrite anything.

[ MORE IN THIS NICHE ]

Other local llm runners tools we rate

Three picks across different tradeoffs — so you don't end up with three near-clones of WhisperX.

LIGHTEST HARDWARE //

Transformers

The library every LLM ships against first.

OPEN SOURCE2–24 GB VRAM

BEST FREE OPTION //

llama.cpp

The C++ inference engine powering most local LLMs.

OPEN SOURCECPU-CAPABLE

TOP QUALITY //

Ollama

One-command local LLM runtime.

OPEN SOURCECPU-CAPABLE

What is WhisperX?

WhisperX takes faster-whisper and adds the things production transcription actually needs: forced-alignment for word-accurate timestamps (via wav2vec2), speaker diarisation (via pyannote), and VAD-based chunking. The de facto open-source pipeline for subtitling and podcast transcripts.

Pros & cons

✓ PROS

Word-level timestamps that are actually word-level
Speaker diarisation in the same pipeline
Built on faster-whisper — speed inherited

– CONS

Diarisation step needs accepting HF model terms
Heavier dependency stack (PyAnnote, wav2vec2)

What's actually free?

BSD-4. Note: pyannote diarisation requires HF model access (free).

✓ Actually FreeOpen SourceWatermark-Free

Alternatives

faster-whisper

Whisper, 4× faster, same accuracy. CTranslate2 backend.

OPEN SOURCE2–6 GB VRAM

VRAM fit2–6 GB

OpenAI Whisper

The reference open-source speech-to-text model.

OPEN SOURCE2–10 GB VRAM

VRAM fit2–10 GB