- throughput:
- 8.0 t/s gen
- quant:
- Q4_K_M (gguf)
text-generation
~8 tok/s on RTX 4060 Ti 16GB. 31B at Q4 barely fits — very limited context. Source: compute-market.com
NVIDIA · 16GB · 4 reports
~8 tok/s on RTX 4060 Ti 16GB. 31B at Q4 barely fits — very limited context. Source: compute-market.com
~15 tok/s. 26B MoE at Q8 on 16GB — tight but runs. Source: compute-market.com
~25 tok/s on RTX 4060 Ti 16GB. 26B MoE Q4 fits with 8K context — the sweet spot. Source: compute-market.com
~45 tok/s on RTX 4060 Ti 16GB. E4B at Q4. Source: compute-market.com