Best GPUs for 30B local LLMs

30B-class models in Q4 fit comfortably in 24GB of VRAM with room for a useful context window — the sweet spot for a single consumer GPU. Apple Silicon Macs with 32GB+ unified memory also handle them well. Ranked from community reports.

Ranked from 61 community reports on llamaperf.

Ranked by community reports

#	GPU	VRAM	Reports	Fastest t/s
1	RX 7900 XTXamd	24GB	6	58.0
2	RTX 5090nvidia	32GB	4	106.5
3	RTX 4060 Ti 16GBnvidia	16GB	4	45.0
4	RTX A6000 48GBnvidia	48GB	4	16.9
5	AMD Threadripper 256GBamd	256GB	4	8.8
6	RTX 4090nvidia	24GB	3	149.6
7	RTX 3090nvidia	24GB	3	66.0
8	Instinct MI300X 192GBamd	192GB	2	60.0
9	RX 7900 XTamd	20GB	2	38.0
10	Instinct MI250X 128GBamd	128GB	2	35.0
11	RX 7800 XT 16GBamd	16GB	2	27.0
12	M5 Max 128GBapple	128GB	2	7.5
13	H100 80GBnvidia	80GB	1	45.0
14	RTX 5060 Ti 16GBnvidia	16GB	1	45.0
15	M5 Max 64GBapple	64GB	1	32.0
16	M4 Max 64GBapple	64GB	1	23.0
17	M4 16GBapple	16GB	1	23.0
18	M4 Max 36GBapple	36GB	1	21.0
19	M3 16GBapple	16GB	1	21.0
20	M1 Pro 16GBapple	16GB	1	20.0
21	M4 Pro 24GBapple	24GB	1	19.0
22	M3 Max 48GBapple	48GB	1	18.0
23	M2 16GBapple	16GB	1	18.0
24	M3 Max 36GBapple	36GB	1	16.0
25	M2 Max 32GBapple	32GB	1	16.0
26	M2 Ultra 64GBapple	64GB	1	14.0
27	M3 Pro 18GBapple	18GB	1	14.0
28	M1 16GBapple	16GB	1	14.0
29	M1 Ultra 64GBapple	64GB	1	12.0
30	M2 Pro 16GBapple	16GB	1	12.0
31	M1 Max 32GBapple	32GB	1	10.0
32	AMD MI50 32GBamd	32GB	1	9.7
33	M3 Max 128GBapple	128GB	1	5.5
34	M2 Ultra 192GBapple	192GB	1	—
35	DGX Sparknvidia	128GB	1	—

Models that fit

Qwen3.6231 Gemma 4176

No reports yet

These match the profile but nobody has submitted a report yet.

What to look for

24GB cards are the sweet spot

RTX 3090s and 4090s (both 24GB) hold a 30B-class model in Q4 with plenty of headroom for an 8–16K context. This is arguably the best price/capability point in local LLM inference today — you get most of the quality of a 70B model at a fraction of the hardware cost.

16GB cards work with tighter quants

An RTX 4060 Ti 16GB or RTX 4070 Ti Super 16GB can run 30B models at Q3/Q4 with shorter contexts, though you'll feel the squeeze with longer prompts. Q3 quants noticeably hurt quality on most models — Q4 is the practical floor.

Frequently asked

What's the best GPU for a 30B local LLM?

RTX 3090 (used) or RTX 4090 (new) — both 24GB — are the standard recommendations. They hold a 30B model in Q4 with headroom for a useful context window and run at 25–50 tokens-per-second on most engines.

Can a 16GB GPU run 30B models?

Yes, with caveats. Q3/Q4 quants of 30B-class models fit in ~14–17GB depending on the architecture. You'll have less context room and may need to lower precision further than ideal. A 24GB card is meaningfully better.

How we rank

Hardware is sorted by the number of community submissions on llamaperf — a proxy for how widely each card is used in practice for local LLM inference. Within that, we surface the fastest tokens-per-second observed on each as a quality signal. Submissions come primarily from r/LocalLLaMA discussions and direct user uploads. Nothing here is sponsored or affiliate-driven.