H100 80GB

NVIDIA · 80GB · 1 report

This page is thin (1 of 3 reports needed for indexing). Help fill it in.

Latest Most reported Fastest t/s

Qwen3.6 27B

2× H100 80GB · vLLM · 128,000 ctx

throughput:: 45.0 t/s gen

codingagentic

User rents GPU instance with 2x H100s (160GB VRAM) to run Qwen3.6-27B at 45 t/s. Uses vLLM for inference. Runs multiple agents (Claude Code, QwenCode, social media bots) hitting the API simultaneously. Context length 128K. Cost ~$0.90/hr, spent $120 last month. Model outperformed 120B model in tests.