llamaperf

RX 7900 XTX

AMD · 24GB · 6 reports

Tone: positive
throughput:
40.0 t/s gen
quant:
Q4_K_M (gguf)
text-generation

~35-45 tok/s on RX 7900 XTX. Gemma 4 E4B Q4_K_M via Ollama. Best consumer AMD option. Source: gemma4-ai.com AMD GPU guide

Tone: negative
throughput:
0.6 t/s gen · 268.0 t/s pp
quant:
FP16 (safetensors)
text-generation

Gemma 4 E4B on RX 7900 XTX via vLLM + ROCm. Long (TurboQuant) 30K: 267.96 prompt tok/s, 0.57 gen tok/s. Source: flexinfer.ai

Tone: positive
throughput:
2.5 t/s gen · 388.9 t/s pp
quant:
FP16 (safetensors)
text-generation

Gemma 4 E4B on RX 7900 XTX via vLLM + ROCm. Long (TurboQuant) 10K: 388.92 prompt tok/s, 2.48 gen tok/s. Source: flexinfer.ai

Tone: positive
throughput:
12.0 t/s gen · 1887.3 t/s pp
quant:
FP16 (safetensors)
text-generation

Gemma 4 E4B on RX 7900 XTX via vLLM + ROCm. Fast lane (TRITON_ATTN): 1,887 prompt tok/s, 12.04 gen tok/s. Source: flexinfer.ai

throughput:
58.0 t/s gen · 83.0 t/s pp
quant:
FP16 (safetensors)
text-generation

Gemma 4 E4B on RX 7900 XTX via vLLM + ROCm. Default path: 57.96 gen tok/s, 82.96 prompt tok/s. Source: flexinfer.ai