vLLM
High-throughput LLM serving for GPUs.
- Min VRAM
- 24 GB
- GPU class
- Datacenter GPU
- Quant
- FP16
Twenty-four gigabytes is the current "new floor" for state-of-the-art local AI. HunyuanVideo and Wan 2.2 14B FP8 fit. Flux LoRA training is realistic. 70B LLMs at Q4 run fully on-GPU. This is the workstation tier for serious local-AI work in 2026.
Sorted by best fit for this tier — tools designed around your VRAM budget first, then by our power-user score.
High-throughput LLM serving for GPUs.
World-foundation models for physical AI.
13B open-weight cinematic text-to-video.
Autoregressive video diffusion at 24 GB.
Genmo's 10-B open-weight T2V — the first 'genuinely fluid' OSS video model.
Serverless Python for GPU workloads.
Modern training framework — Flux, SDXL, SD3 LoRAs in YAML.
Microsoft Research's structured 3D representation model.
Memory-efficient T2V via pyramidal flow matching.
Open-weight video diffusion from Alibaba.
The standard SDXL/Flux LoRA training UI.
The INRIA original — train your own splats.
Tencent's open 3D generator — multi-view, PBR, ready-to-use meshes.
Real-time-ish open video diffusion from Lightricks.
Modern alternative trainer for SD/SDXL/Flux.
Stability's MMDiT flagship at 8B params.
Diffusion-based photorealistic upscaler.
Open-source text-to-video diffusion from THUDM.
Dead-simple Flux LoRA training in a Gradio UI.
Image-to-video diffusion — 25 frames, 14 or 25 steps.
Animation motion modules for ComfyUI.
The library every LLM ships against first.
The reference open-source speech-to-text model.
Whisper, 4× faster, same accuracy. CTranslate2 backend.
The most active open face-swap toolkit.
The default OSS upscaler, still.