- throughput:
- 1.3 t/s gen
- quant:
- Q4 (gguf)
text-generation
~0.5-2 tok/s on CPU. 26B MoE painfully slow on CPU. Source: gemma4-ai.com hardware guide
AMD · 256GB unified memory · 4 reports
~0.5-2 tok/s on CPU. 26B MoE painfully slow on CPU. Source: gemma4-ai.com hardware guide
AMD Threadripper 256GB · llama.cpp
~2-5 tok/s on CPU. E4B usable but slow. Source: gemma4-ai.com hardware guide
AMD Threadripper 256GB · llama.cpp
~5-10 tok/s on CPU. E2B is usable CPU-only. Source: gemma4-ai.com hardware guide
AMD Threadripper 256GB · llama.cpp
CPU-only, no GPU. Outperformed 4090 for 31B gen speed (8.8 vs 7.8 t/s) due to no VRAM bottleneck. Source: n1n.ai