ElevenLabs
The benchmark commercial TTS / voice clone API.
Local AI lives and dies by VRAM. Pick your tier — we'll show only the workflows, models, and runners that actually fit in that budget. Every entry tells you the required quantization, expected speed, and what stays out of reach.
A 6 GB card is the practical floor for serious local AI in 2026. With aggressive quantization (NF4, GGUF Q4), you can run SDXL, Flux dev, and small LLMs. Long video diffusion stays out of reach — but you can absolutely learn the entire stack on this hardware.
e.g. GTX 1660 Ti, RTX 2060, RTX 3050 / 3060 6GB
Eight gigabytes is the most common hobbyist tier. You get comfortable SDXL, Flux at FP8/NF4, and Q4 13B LLMs. ComfyUI is fully usable. Video diffusion is still a stretch, but everything in the image and LLM ecosystem opens up here.
e.g. RTX 3060 Ti, RTX 3070, RTX 4060 / 4060 Ti
Twelve gigabytes is the modern hobbyist sweet spot. Flux runs comfortably at FP8, 13B LLMs fit at Q5/Q6, and the smallest video diffusion models become viable. The RTX 3060 12 GB remains the best price-per-VRAM card ever made.
e.g. RTX 3060 12GB, RTX 4070, RTX 4070 SUPER
Sixteen gigabytes is where things stop feeling tight. Apple Silicon Macs with 16 GB unified memory hit similar territory. Local video diffusion becomes practical (LTX-Video, CogVideoX 5B), and EXL2 LLM quants open up new throughput options.
e.g. RTX 4070 Ti SUPER, RTX 4080 (16 GB), RTX 5070 Ti
Twenty-four gigabytes is the current "new floor" for state-of-the-art local AI. HunyuanVideo and Wan 2.2 14B FP8 fit. Flux LoRA training is realistic. 70B LLMs at Q4 run fully on-GPU. This is the workstation tier for serious local-AI work in 2026.
e.g. RTX 3090 / 3090 Ti, RTX 4090, RTX 5080
Forty-eight gigabytes is workstation territory. Now you can run the biggest open-weight video models at full precision, serve 70B LLMs through vLLM, and start fine-tuning instead of just LoRA-ing. The RTX 5090, RTX A6000, and L40S are the typical homes for this tier.
e.g. NVIDIA RTX 5090, NVIDIA RTX A6000, NVIDIA L40 / L40S
Eighty gigabytes and up is datacenter territory — typically rented by the hour rather than owned. This is where you serve 70B+ LLMs at production scale, full fine-tune large models, and run multi-GPU video diffusion pipelines.
e.g. NVIDIA A100 80GB, NVIDIA H100 80GB, NVIDIA H200 141GB
When you don't have local hardware (or you need quality above what fits in your card), these cloud APIs cover the same workloads.
The benchmark commercial TTS / voice clone API.
Agentic coding in VS Code — reads, writes, runs, browses.
The benchmark for aesthetic image generation.
Run any open-source model with one API call.
Real-time inference platform — sub-second latency for diffusion.
Phone-scan to NeRF, Genie text-to-3D, and Dream Machine video.
The image model that actually renders text.
Cinematic-quality cloud video generation.
GPU marketplace — rent consumer cards at half the hyperscaler price.
Text-to-3D, image-to-3D, and texture generation for game pipelines.
Serverless GPU functions — deploy a Python file, get an HTTPS endpoint.
Cloud video gen with strong motion control.
Cloud video generation with Pikaffects and Scenes.