Gemma 4 30.7B Instruct
RTX A6000 48GB · llama.cpp
- throughput:
- 0.5 t/s gen
- quant:
- bf16 (safetensors)
text-generation
bf16 no quantization. 43.82GB VRAM. Very slow — 1 token every 2s. Source: dev.to Gaurav Vij
NVIDIA · 48GB · 4 reports
RTX A6000 48GB · llama.cpp
bf16 no quantization. 43.82GB VRAM. Very slow — 1 token every 2s. Source: dev.to Gaurav Vij
bf16 no quantization. 42.30GB VRAM. 18x faster than dense 31B. Source: dev.to Gaurav Vij
RTX A6000 48GB · llama.cpp
bf16 no quantization. ~16GB VRAM. Source: dev.to Gaurav Vij
RTX A6000 48GB · llama.cpp
bf16 no quantization. 10.25GB VRAM, 61ms TTFT. Source: dev.to Gaurav Vij