CUDA recommended; MetalApple Silicon ✓CPU-CapableQuant: GGUF, EXL2, AWQ +2
EXL2 on 24 GB cards is the sweet spot for 70B Q3.
Screenshot placeholder · Text Generation WebUI
What is Text Generation WebUI?
Oobabooga's gradio UI for local LLMs. Supports llama.cpp, ExLlamaV2, Transformers, and more. The go-to power-user chat front-end for hobbyists running quantized 70B models on consumer GPUs.