Qwen2.5 32B Coder
RTX 3090 · llama.cpp · 32,768 ctx
- throughput:
- 28.0 t/s gen · 450.0 t/s pp
- quant:
- Q4_K_M (gguf)
- kv:
- Q8
coding
Solid for autocomplete, occasionally hallucinates imports in multi-file refactors. Build b4400.
Alibaba · 1 report
RTX 3090 · llama.cpp · 32,768 ctx
Solid for autocomplete, occasionally hallucinates imports in multi-file refactors. Build b4400.