- throughput:
- 30.0 t/s gen
- quant:
- Q4_K_M (gguf)
text-generation
~30 tok/s on RTX 4070 12GB. 26B MoE fits at Q4 with short context. Source: compute-market.com
NVIDIA · 12GB · 2 reports
~30 tok/s on RTX 4070 12GB. 26B MoE fits at Q4 with short context. Source: compute-market.com
RTX 4070 · Ollama
~55 tok/s on RTX 4070 12GB. Ada Lovelace efficiency. Source: estimated from compute-market tiers