- throughput:
- 60.0 t/s gen
- quant:
- Q4_K_M (gguf)
text-generation
~60 tok/s on RTX 3060 12GB. E2B runs effortlessly. Source: estimated from compute-market tiers
NVIDIA · 12GB · 3 reports
~60 tok/s on RTX 3060 12GB. E2B runs effortlessly. Source: estimated from compute-market tiers
~25 tok/s on RTX 3060 12GB. 26B MoE Q4 fits with ~8K context. Great value option. Source: compute-market.com
RTX 3060 12GB · Ollama
~45 tok/s on RTX 3060 12GB. E4B fits easily. Source: compute-market.com