llamaperf

VRAM Calculator

Pick your GPU — see which open-weight models fit in VRAM, at which quantization, and roughly how fast they run. Tokens/sec comes from community-reported benchmarks when we have them.

0 fits fully·4 with offload·0 won't run
All 4 reports for this GPU →
Offload
Qwen3.5
27B params·Alibaba·Q8_0
33.5 GB
Offload
Gemma 4
30.7B params·Google DeepMind·Q6_K
32.3 GB
Offload
Gemma 3
27B params·Google·Q6_K
28.9 GB
Offload
Qwen3.6
3B / 35B·Alibaba·FP16
11.9 GB