DATASHEET // STABLE-VIDEO-DIFFUSION

Stable Video Diffusion

Image-to-video diffusion — 25 frames, 14 or 25 steps.

OPEN SOURCE12–16 GB VRAMRuns locally

Actually FreeOpen SourceWatermark-FreeHobbyist-OKAPI

Visit Stable Video DiffusionUPDATED 2025-08-15 · DIRECT LINK

huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt

HARDWARE REQUIREMENTS //

Runs locally · High-end GPU (16–24 GB)

12–16 GB VRAM

Min VRAM

12 GB

Rec. VRAM

16 GB

Min RAM

16 GB

Rec. RAM

32 GB

Disk

20 GB

GPU class

High-end GPU

11.8+No Apple SiliconGPU RequiredQuant: FP16, FP8

SVD-XT 25-frame needs ~16 GB; 12 GB viable with lower batch.

[ EDITORIAL PICK ]

Why we recommend Stable Video Diffusion

DERIVED FROM METADATA — NOT SPONSORED

Open source
Source is public — you can audit it, fork it, and you'll never lose access to your workflows if Stable Video Diffusion the company changes direction.
Runs on 12 GB
Comfortable on a mid-range consumer card — no need to remortgage for an A100.
2 quant formats
Supports FP16, FP8 — you can dial VRAM use up or down to match your card.
Beginner-friendly
You don't need to read a paper before getting your first result — sensible defaults and a quick install.

[ EVIDENCE NOTE ]

Documentation-led datasheet

This page summarizes upstream documentation, release information, and editorially reviewed catalogue fields. It is not presented as a hands-on benchmark. Verify changing requirements at the official project; report stale data through our corrections channel.

VRAM guide →

AT-A-GLANCE SIGNALS //

DERIVED FROM THIS PAGE'S DATA

Install difficulty
Standard
A standard local install — download, install dependencies, point at your GPU.
Hardware comfort
Mainstream
Needs 12 GB minimum — RTX 3060 12GB or 4070 territory.
Ecosystem
Strong devkit
Open-source AND ships an API — easy to integrate, possible to host yourself.
Verification
Stale
335 days since the last refresh — treat hardware numbers as a floor, not a ceiling.

[ COMMUNITY GUIDES & WORKFLOWS ]

Tutorials & deep-dives for Stable Video Diffusion

Hand-picked from YouTube, Reddit, GitHub, and the wider web. Each link goes straight to the source — we don't intercept or rewrite anything.

[ MORE IN THIS NICHE ]

What is Stable Video Diffusion?

Stability AI's open-weight image-to-video model. Feed it a still image, get back a 25-frame clip with plausible camera motion and scene dynamics. Two variants: SVD (14 frames) and SVD-XT (25 frames). Image-conditioning only — no text prompt control over motion.

Pros & cons

✓ PROS

Surprisingly coherent short clips from a single still
Lighter than Wan / Hunyuan — fits on 16 GB GPUs
First widely-available open video diffusion model

– CONS

No text-prompt control over motion direction
Outclassed on quality by newer Wan 2.2 / Hunyuan Video

What's actually free?

Stability AI Community License (free for non-commercial / small revenue).

✓ Actually FreeOpen SourceWatermark-Free

Alternatives

Wan 2.2

Open-weight video diffusion from Alibaba.

OPEN SOURCE12–48 GB VRAM

VRAM fit12–48 GB

HunyuanVideo

13B open-weight cinematic text-to-video.

OPEN SOURCE24–48 GB VRAM

VRAM fit24–48 GB

LTX-Video

Real-time-ish open video diffusion from Lightricks.

OPEN SOURCE12–16 GB VRAM

VRAM fit12–16 GB

CogVideoX 5B

Open-source text-to-video diffusion from THUDM.

OPEN SOURCE12–24 GB VRAM

VRAM fit12–24 GB