RX 7900 XTX

AMD · 24GB · 4 reports

Gemma 4

Benchmark of Gemma 4 QAT vs regular quants on AMD 7900 XTX. No token/s reported, but wall clock times show significant speedups (e.g., 12B QAT 45% faster, 83% throughput increase). Quality reported identical. Models tested: 12B, 26B, 31B, E4B.

Qwen3.6 27B

RX 7900 XTX · 32,768 ctx

throughput:: 32.0 t/s pp
quant:: Q6 (gguf)

coding

User is considering adding a second 7900 XTX for 48GB VRAM to run larger models. Currently running Qwen 27B Q6 dense with 32K context at 32 t/s prompt processing. Main use case is coding via opencode.

Gemma 4 8B E4B Instruct

RX 7900 XTX · Ollama

throughput:: 40.0 t/s gen
quant:: Q4_K_M (gguf)

text-generation

~35-45 tok/s on RX 7900 XTX. Gemma 4 E4B Q4_K_M via Ollama. Best consumer AMD option. Source: gemma4-ai.com AMD GPU guide

Gemma 4 8B E4B Instruct

RX 7900 XTX · vLLM

throughput:: 58.0 t/s gen · 83.0 t/s pp
quant:: FP16 (safetensors)

text-generation

Gemma 4 E4B on RX 7900 XTX via vLLM + ROCm. Default path: 57.96 gen tok/s, 82.96 prompt tok/s. Source: flexinfer.ai