llamaperf

RTX 4070

NVIDIA · 12GB · 2 reports

This page is thin (2 of 3 reports needed for indexing). Help fill it in.
throughput:
55.0 t/s gen
quant:
Q4_K_M (gguf)
text-generation

~55 tok/s on RTX 4070 12GB. Ada Lovelace efficiency. Source: estimated from compute-market tiers