Fit calculator
Will it fit? Approximate VRAM estimate based on parameter count, quantization, context length, and KV-cache quant. Calibrated against Llama/Mistral/Qwen architectures — ±20% accurate for those.
Estimated total
20.4 GBfits with 3.6 GB headroom
- Model weights
- 19.78 GB
- KV cache
- 0.00 GB
- Overhead (framework, activations)
- 0.60 GB
KV cache scales linearly with context. Try halving context, or stepping KV quant down to Q8 / Q4.