Best hardware for local LLMs

Hand-curated rankings sliced by vendor, memory size, and target model size — pulled from real community reports rather than spec sheets.

Best GPUs for running local LLMs
Best GPUs for running local LLMs — ranked from real community reports of tokens-per-second across llama.cpp, vLLM, MLX, and exllama.
Best Mac for running local LLMs
Best Apple Silicon Mac for local LLMs. Community reports across M1–M5 covering tokens-per-second on MLX and llama.cpp.
Best NVIDIA GPUs for local LLMs
Best NVIDIA GPUs for running local LLMs, ranked by community reports across CUDA-backed engines (llama.cpp, vLLM, exllamav2).
Best GPUs for 70B local LLMs
Which GPUs actually run 70B models like Llama 3.1 70B and Qwen 2.5 72B locally — community reports of VRAM headroom, quants, and tokens-per-second.
Best GPUs for 30B local LLMs
Best GPUs for running 30B-class local LLMs (Qwen 2.5 32B, Mistral Small) — community-reported tokens-per-second and VRAM fit.
Best Mac for 70B local LLMs
Which Apple Silicon Macs actually run 70B-class models locally. Community reports of tokens-per-second on MLX and llama.cpp Metal.