MLX
Apple's machine-learning framework — the fastest way to run LLMs on Apple Silicon.
2 community reports
MLX is Apple's open-source ML framework, designed from the ground up for Apple Silicon's unified memory architecture. For LLM inference on a Mac, MLX typically delivers the highest tokens-per-second of any engine.
The community packages popular models (Llama, Qwen, Mistral, etc.) in MLX-quantized formats — the 4-bit and 8-bit MLX quants are the standard for serious local inference on M-series hardware.
Tradeoff: MLX is Apple-only. There's no portability to other hardware, and its model coverage is narrower than llama.cpp's. Most users run llama.cpp Metal first and switch to MLX once they want to push the largest models on their Mac.
Top GPUs running MLX
| GPU | VRAM | Reports | Fastest t/s |
|---|---|---|---|
| M3 Max 128GBapple | 128GB | 1 | 5.5 |
| M5 Max 128GBapple | 128GB | 1 | 5.5 |
Top models on MLX
Frequently asked
Is MLX faster than llama.cpp on a Mac?
Usually yes, especially on larger models and the M-Max / M-Ultra tier. The gap is most visible on 30B+ models; for 7B–13B the two are close.
Can I run MLX on Windows or Linux?
No. MLX is Apple Silicon only by design — it's built around Metal and the unified memory architecture.
What quants does MLX use?
MLX has its own quantization scheme. The community-published 4-bit MLX quants are the most common; 8-bit is also widely available for higher quality.