Skip to content
AI Tools Finder

Best AI tools for run llms locally

Run large language models on your own hardware.

Hardware:
21 tools

Ollama

One-command local LLM runtime.

Open SourceCPU-capable
Actually FreeNo SignupOpen SourceWatermark-Free

Cline

Agentic coding in VS Code β€” reads, writes, runs, browses.

Open Source☁ Cloud · no GPU
Actually FreeNo SignupOpen SourceWatermark-Free

vLLM

High-throughput LLM serving for GPUs.

Open Source 24–80 GB VRAM
Min VRAM
24 GB
GPU class
Datacenter GPU
Quant
FP16
Actually FreeNo SignupOpen SourceWatermark-Free

Open WebUI

Self-hosted ChatGPT-style frontend for Ollama / OpenAI.

Open Sourcevia ollama
Actually FreeNo SignupOpen SourceWatermark-Free

LM Studio

Desktop GUI for running local LLMs.

FreemiumCPU-capable
Actually FreeNo SignupWatermark-FreeHobbyist-Friendly

F5-TTS

Zero-shot voice cloning TTS β€” 15 s of audio is enough.

Open Source 8–12 GB VRAM
Min VRAM
8 GB
GPU class
Mid GPU
Quant
FP16
Actually FreeNo SignupOpen SourceWatermark-Free

llama.cpp

The C++ inference engine powering most local LLMs.

Open SourceCPU-capable
Actually FreeNo SignupOpen SourceWatermark-Free

Continue

Open-source Copilot β€” VS Code & JetBrains, any model.

Open SourceCPU-capable
Actually FreeNo SignupOpen SourceWatermark-Free

LobeChat

Beautifully designed chat UI with plugins and image generation.

Open SourceCPU-capable
Actually FreeNo SignupOpen SourceWatermark-Free

Jan

Open-source ChatGPT desktop β€” runs models locally or via API.

Open SourceCPU-capable
Actually FreeNo SignupOpen SourceWatermark-Free

faster-whisper

Whisper, 4Γ— faster, same accuracy. CTranslate2 backend.

Open Source 2–6 GB VRAM
Min VRAM
2 GB
GPU class
Entry GPU
Quant
INT8
Actually FreeNo SignupOpen SourceWatermark-Free

Msty

Polished local-LLM client with split chats and knowledge stacks.

Freemium Β· from $4.16/moCPU-capable
Actually FreeNo SignupWatermark-FreeHobbyist-Friendly

Transformers

The library every LLM ships against first.

Open Source 2–24 GB VRAM
Min VRAM
2 GB
GPU class
Entry GPU
Quant
FP16
Actually FreeNo SignupOpen SourceWatermark-Free

Aider

Terminal-native AI pair programmer with git awareness.

Open SourceCPU-capable
Actually FreeNo SignupOpen SourceWatermark-Free

AnythingLLM

RAG-first local LLM workspace with workspaces and agents.

Open SourceCPU-capable
Actually FreeNo SignupOpen SourceWatermark-Free

OpenAI Whisper

The reference open-source speech-to-text model.

Open Source 2–10 GB VRAM
Min VRAM
2 GB
GPU class
Entry GPU
Quant
FP16
Actually FreeNo SignupOpen SourceWatermark-Free

WhisperX

Whisper + speaker diarisation + word-level timestamps.

Open Source 4–8 GB VRAM
Min VRAM
4 GB
GPU class
Entry GPU
Quant
INT8
Actually FreeOpen SourceWatermark-FreeHobbyist-Friendly

Text Generation WebUI

The "A1111 for LLMs" β€” multi-loader local chat UI.

Open Source 6–24 GB VRAM
Min VRAM
6 GB
GPU class
High-end GPU
Quant
GGUF
Actually FreeNo SignupOpen SourceWatermark-Free

Tabby

Self-hosted, GPU-accelerated coding autocompletion.

Freemium Β· from $19/mo 4–8 GB VRAM
Min VRAM
4 GB
GPU class
Entry GPU
Quant
GGUF
Actually FreeNo SignupOpen SourceWatermark-Free

Open Interpreter

Natural-language code execution on your machine.

Open SourceCPU-capable
Actually FreeNo SignupOpen SourceWatermark-Free

Stable Audio Open

Open-weight text-to-audio β€” 47-second sound effects and music.

Open Source 6–8 GB VRAM
Min VRAM
6 GB
GPU class
Entry GPU
Quant
FP16
Actually FreeOpen SourceWatermark-FreeHobbyist-Friendly