RTX 5090

Name: RTX 5090
Brand: nvidia
Rating: 5 (3 reviews)

NVIDIA · 32GB · 3 reports

Latest Most reported Fastest t/s

Qwen3.6

RTX 5090 · vLLM · 262,144 ctx

throughput:: 106.5 t/s gen
quant:: INT4 (safetensors)
kv:: Q8

Qwen3.6-27B-INT4 via vllm 0.19 on 1x RTX 5090. Achieves 105-108 tps generation with 256k context. Uses fp8_e4m3 KV cache, flashinfer attention, MTP speculative decoding (3 tokens). Model from Lorbus quant (AutoRound).

Qwen3.6

RTX 5090 · vLLM · 218,000 ctx

throughput:: 80.0 t/s gen
quant:: NVFP4 (safetensors)

Qwen3.6-27B at ~80 tps with 218k context window on 1x RTX 5090 served by vllm 0.19.1rc1. Uses NVFP4 quantization.

Qwen3.6 27B

RTX 5090 · llama.cpp · 200,000 ctx

quant:: IQ4_XS (gguf)
kv:: Q8
rating:: 5/5

codingtool-use

User reports Qwen 3.6 27B is excellent for pyspark/python and data transformation debugging. Running on ASUS ROG Strix SCAR 18 with RTX 5090 laptop (24GB VRAM) and 64GB DDR5 RAM. Using llama.cpp with IQ4_XS quant at 200k context with Q8_0 KV cache. Initially tried q4_k_m at q4_0. Cancelling cloud subscriptions due to local performance. No tokens/sec reported.