llama.cpp
The C++ inference engine powering most local LLMs.
Open SourceCPU-capable
Actually FreeNo SignupOpen SourceWatermark-Free
One-command local LLM runtime.
Runs locally · Mid GPU (12 GB)
7B Q4 runs on 8 GB. 13B Q4 on 12 GB. 70B Q4 needs 48 GB unified or split GPU.
Ollama wraps llama.cpp behind a clean CLI and HTTP API. Pull a model (`ollama run llama3.1`), get a chat or an OpenAI-compatible endpoint. Excellent default for hobbyists running quantized models.
Fully free / OSS. You provide the hardware.
The C++ inference engine powering most local LLMs.
Desktop GUI for running local LLMs.
The "A1111 for LLMs" — multi-loader local chat UI.
High-throughput LLM serving for GPUs.