Fit calculator

Will it fit? Approximate VRAM estimate based on parameter count, quantization, context length, and KV-cache quant. Calibrated against Llama/Mistral/Qwen architectures — ±20% accurate for those.

Estimated total
20.4 GBfits with 3.6 GB headroom
Model weights
19.78 GB
KV cache
0.00 GB
Overhead (framework, activations)
0.60 GB

KV cache scales linearly with context. Try halving context, or stepping KV quant down to Q8 / Q4.