Best GPUs for 30B local LLMs
30B-class models in Q4 fit comfortably in 24GB of VRAM with room for a useful context window — the sweet spot for a single consumer GPU. Apple Silicon Macs with 32GB+ unified memory also handle them well. Ranked from community reports.
Ranked from 61 community reports on llamaperf.
Ranked by community reports
| # | GPU | VRAM | Reports | Fastest t/s |
|---|---|---|---|---|
| 1 | RX 7900 XTXamd | 24GB | 6 | 58.0 |
| 2 | RTX 5090nvidia | 32GB | 4 | 106.5 |
| 3 | RTX 4060 Ti 16GBnvidia | 16GB | 4 | 45.0 |
| 4 | RTX A6000 48GBnvidia | 48GB | 4 | 16.9 |
| 5 | AMD Threadripper 256GBamd | 256GB | 4 | 8.8 |
| 6 | RTX 4090nvidia | 24GB | 3 | 149.6 |
| 7 | RTX 3090nvidia | 24GB | 3 | 66.0 |
| 8 | Instinct MI300X 192GBamd | 192GB | 2 | 60.0 |
| 9 | RX 7900 XTamd | 20GB | 2 | 38.0 |
| 10 | Instinct MI250X 128GBamd | 128GB | 2 | 35.0 |
| 11 | RX 7800 XT 16GBamd | 16GB | 2 | 27.0 |
| 12 | M5 Max 128GBapple | 128GB | 2 | 7.5 |
| 13 | H100 80GBnvidia | 80GB | 1 | 45.0 |
| 14 | RTX 5060 Ti 16GBnvidia | 16GB | 1 | 45.0 |
| 15 | M5 Max 64GBapple | 64GB | 1 | 32.0 |
| 16 | M4 Max 64GBapple | 64GB | 1 | 23.0 |
| 17 | M4 16GBapple | 16GB | 1 | 23.0 |
| 18 | M4 Max 36GBapple | 36GB | 1 | 21.0 |
| 19 | M3 16GBapple | 16GB | 1 | 21.0 |
| 20 | M1 Pro 16GBapple | 16GB | 1 | 20.0 |
| 21 | M4 Pro 24GBapple | 24GB | 1 | 19.0 |
| 22 | M3 Max 48GBapple | 48GB | 1 | 18.0 |
| 23 | M2 16GBapple | 16GB | 1 | 18.0 |
| 24 | M3 Max 36GBapple | 36GB | 1 | 16.0 |
| 25 | M2 Max 32GBapple | 32GB | 1 | 16.0 |
| 26 | M2 Ultra 64GBapple | 64GB | 1 | 14.0 |
| 27 | M3 Pro 18GBapple | 18GB | 1 | 14.0 |
| 28 | M1 16GBapple | 16GB | 1 | 14.0 |
| 29 | M1 Ultra 64GBapple | 64GB | 1 | 12.0 |
| 30 | M2 Pro 16GBapple | 16GB | 1 | 12.0 |
| 31 | M1 Max 32GBapple | 32GB | 1 | 10.0 |
| 32 | AMD MI50 32GBamd | 32GB | 1 | 9.7 |
| 33 | M3 Max 128GBapple | 128GB | 1 | 5.5 |
| 34 | M2 Ultra 192GBapple | 192GB | 1 | — |
| 35 | DGX Sparknvidia | 128GB | 1 | — |
Models that fit
No reports yet
These match the profile but nobody has submitted a report yet.
What to look for
24GB cards are the sweet spot
RTX 3090s and 4090s (both 24GB) hold a 30B-class model in Q4 with plenty of headroom for an 8–16K context. This is arguably the best price/capability point in local LLM inference today — you get most of the quality of a 70B model at a fraction of the hardware cost.
16GB cards work with tighter quants
An RTX 4060 Ti 16GB or RTX 4070 Ti Super 16GB can run 30B models at Q3/Q4 with shorter contexts, though you'll feel the squeeze with longer prompts. Q3 quants noticeably hurt quality on most models — Q4 is the practical floor.
Frequently asked
What's the best GPU for a 30B local LLM?
RTX 3090 (used) or RTX 4090 (new) — both 24GB — are the standard recommendations. They hold a 30B model in Q4 with headroom for a useful context window and run at 25–50 tokens-per-second on most engines.
Can a 16GB GPU run 30B models?
Yes, with caveats. Q3/Q4 quants of 30B-class models fit in ~14–17GB depending on the architecture. You'll have less context room and may need to lower precision further than ideal. A 24GB card is meaningfully better.
How we rank
Hardware is sorted by the number of community submissions on llamaperf — a proxy for how widely each card is used in practice for local LLM inference. Within that, we surface the fastest tokens-per-second observed on each as a quality signal. Submissions come primarily from r/LocalLLaMA discussions and direct user uploads. Nothing here is sponsored or affiliate-driven.