llama.cpp
The C++ inference engine powering most local LLMs.
Open SourceCPU-capable
Actually FreeNo SignupOpen SourceWatermark-Free
The library every LLM ships against first.
Runs locally · Entry GPU (6–8 GB)
Scales with the model you load. Tiny LMs run CPU; 70B-class needs 24 GB+ quantized.
Hugging Face Transformers is the universal Python library for loading and running LLMs (and vision, audio, multimodal models). Slower than vLLM / llama.cpp for serving, but it's where every new model lands first, and the API is the closest thing the field has to a standard.
Apache 2.0 from Hugging Face.
The C++ inference engine powering most local LLMs.
High-throughput LLM serving for GPUs.
One-command local LLM runtime.