Qwen3.6 27B
2× H100 80GB · vLLM · 128,000 ctx
- throughput:
- 45.0 t/s gen
codingagentic
User rents GPU instance with 2x H100s (160GB VRAM) to run Qwen3.6-27B at 45 t/s. Uses vLLM for inference. Runs multiple agents (Claude Code, QwenCode, social media bots) hitting the API simultaneously. Context length 128K. Cost ~$0.90/hr, spent $120 last month. Model outperformed 120B model in tests.