- throughput:
- 25.0 t/s gen
- quant:
- Q4_K_M (gguf)
~20-30 tok/s on RX 7900 XTX. Gemma 4 26B MoE Q4_K_M via Ollama. 24GB VRAM handles it well. Source: gemma4-ai.com AMD GPU guide
AMD · 24GB · 6 reports
~20-30 tok/s on RX 7900 XTX. Gemma 4 26B MoE Q4_K_M via Ollama. 24GB VRAM handles it well. Source: gemma4-ai.com AMD GPU guide
RX 7900 XTX · Ollama
~35-45 tok/s on RX 7900 XTX. Gemma 4 E4B Q4_K_M via Ollama. Best consumer AMD option. Source: gemma4-ai.com AMD GPU guide
Gemma 4 E4B on RX 7900 XTX via vLLM + ROCm. Long (TurboQuant) 30K: 267.96 prompt tok/s, 0.57 gen tok/s. Source: flexinfer.ai
Gemma 4 E4B on RX 7900 XTX via vLLM + ROCm. Long (TurboQuant) 10K: 388.92 prompt tok/s, 2.48 gen tok/s. Source: flexinfer.ai
Gemma 4 E4B on RX 7900 XTX via vLLM + ROCm. Fast lane (TRITON_ATTN): 1,887 prompt tok/s, 12.04 gen tok/s. Source: flexinfer.ai
Gemma 4 E4B on RX 7900 XTX via vLLM + ROCm. Default path: 57.96 gen tok/s, 82.96 prompt tok/s. Source: flexinfer.ai