M3 Max 128GB

APPLE · 128GB unified memory · 2 reports

See what fits on this GPU →

This page is thin (2 of 3 reports needed for indexing). Help fill it in.

Latest Most reported Fastest t/s

Qwen3.6 35B (3B active)

M3 Max 128GB

quant:: 4bit

agenticcoding

User currently runs Qwen3.6-35B-A3B-4bit on M3 Max 128GB for production sub-agent delegations. Also mentions GLM-5.1 for orchestration. Considering building a 5090 rig.

Qwen3.6 27B

M3 Max 128GB · MLX · 290,000 ctx

throughput:: 5.5 t/s gen · 160.0 t/s pp
quant:: Q8 (mlx)

long-context

User reports 160 tok/s prefill, 5-6 tok/s generation on M5 Max 128GB with Qwen 3.6 27B Q8 MLX at 290k context. GPU utilization only 36-50%, feels off compared to expected 8-14 tok/s generation. Asks for comparison with other setups.