Mixed CPU+GPU offload is its superpower for 70B+ on a single card.
Screenshot placeholder · llama.cpp
What is llama.cpp?
The reference CPU/GPU inference engine for GGUF-quantized LLMs. Ollama, LM Studio, Jan, and KoboldCpp all sit on top of it. Use it directly when you need raw control, multi-GPU split, or exotic quants.