Qwen3.6
RTX 5090 · vLLM · 262,144 ctx
- throughput:
- 106.5 t/s gen
- quant:
- INT4 (safetensors)
- kv:
- Q8
Qwen3.6-27B-INT4 via vllm 0.19 on 1x RTX 5090. Achieves 105-108 tps generation with 256k context. Uses fp8_e4m3 KV cache, flashinfer attention, MTP speculative decoding (3 tokens). Model from Lorbus quant (AutoRound).