llamaperf

Local LLM inference engines

The runtime you pick matters as much as the GPU. Each engine has its own strengths — single-user latency, batched throughput, hardware coverage, quantization support. Below: every engine with community performance reports on llamaperf.