llamaperf

H100 80GB

NVIDIA · 80GB · 1 report

This page is thin (1 of 3 reports needed for indexing). Help fill it in.

Qwen3.6 27B

H100 80GB · vLLM · 128,000 ctx

Tone: positive
throughput:
45.0 t/s gen
codingagentic

User rents GPU instance with 2x H100s (160GB VRAM) to run Qwen3.6-27B at 45 t/s. Uses vLLM for inference. Runs multiple agents (Claude Code, QwenCode, social media bots) hitting the API simultaneously. Context length 128K. Cost ~$0.90/hr, spent $120 last month. Model outperformed 120B model in tests.