Local LLM Runners

Best AI tools for run llms locally

Run large language models on your own hardware.

21 TOOLS INDEXED · HARDWARE-VERIFIED

[ BEFORE YOU CHOOSE ]

The decision that matters

Select for model-format support, context needs, and the amount of CPU/GPU offload you can tolerate—not only for the largest model the launcher can download.

Check these constraints

Exact quantization and model size
Target context and concurrent sessions
GPU backend and system-RAM headroom

Plan local LLM memory →

Compare the catalogue

FILTERS //21 / 21 SHOWN

HARDWARE:Max min VRAM:Compute:

Pricing:Sort:21 TOOLS

Ollama

One-command local LLM runtime.

OPEN SOURCECPU-CAPABLE

Cline

Agentic coding in VS Code — reads, writes, runs, browses.

OPEN SOURCECLOUD · NO GPU

vLLM

High-throughput LLM serving for GPUs.

OPEN SOURCE24–80 GB VRAM

VRAM fit24–80 GB

Open WebUI

Self-hosted ChatGPT-style frontend for Ollama / OpenAI.

OPEN SOURCEVIA OLLAMA

LM Studio

Desktop GUI for running local LLMs.

FREEMIUMCPU-CAPABLE

F5-TTS

Zero-shot voice cloning TTS — 15 s of audio is enough.

OPEN SOURCE8–12 GB VRAM

VRAM fit8–12 GB

llama.cpp

The C++ inference engine powering most local LLMs.

OPEN SOURCECPU-CAPABLE

Continue

Open-source Copilot — VS Code & JetBrains, any model.

OPEN SOURCECPU-CAPABLE

LobeChat

Beautifully designed chat UI with plugins and image generation.

OPEN SOURCECPU-CAPABLE

Jan

Open-source ChatGPT desktop — runs models locally or via API.

OPEN SOURCECPU-CAPABLE

faster-whisper

Whisper, 4× faster, same accuracy. CTranslate2 backend.

OPEN SOURCE2–6 GB VRAM

VRAM fit2–6 GB

Msty

Polished local-LLM client with split chats and knowledge stacks.

FREEMIUM · $4.16/MOCPU-CAPABLE

Transformers

The library every LLM ships against first.

OPEN SOURCE2–24 GB VRAM

VRAM fit2–24 GB

Aider

Terminal-native AI pair programmer with git awareness.

OPEN SOURCECPU-CAPABLE

AnythingLLM

RAG-first local LLM workspace with workspaces and agents.

OPEN SOURCECPU-CAPABLE

OpenAI Whisper

The reference open-source speech-to-text model.

OPEN SOURCE2–10 GB VRAM

VRAM fit2–10 GB

WhisperX

Whisper + speaker diarisation + word-level timestamps.

OPEN SOURCE4–8 GB VRAM

VRAM fit4–8 GB

Text Generation WebUI

The "A1111 for LLMs" — multi-loader local chat UI.

OPEN SOURCE6–24 GB VRAM

VRAM fit6–24 GB

Tabby

Self-hosted, GPU-accelerated coding autocompletion.

FREEMIUM · $19/MO4–8 GB VRAM

VRAM fit4–8 GB

Open Interpreter

Natural-language code execution on your machine.

OPEN SOURCECPU-CAPABLE

Stable Audio Open

Open-weight text-to-audio — 47-second sound effects and music.

OPEN SOURCE6–8 GB VRAM

VRAM fit6–8 GB