High-performance LLM inference on Apple Silicon using MLX and vLLM
vLLM Metal is a plugin that enables vLLM to run on Apple Silicon Macs using MLX as the primary compute backend. It unifies MLX and PyTorch under a single lowering path.
Documentation: https://docs.vllm.ai/projects/vllm-metal/en/latest/
Latest News 🔥
- [2026/04] We released the new version v0.2.0! Unified paged varlen Metal kernel is now the default attention backend. 83x TTFT, 3.6x throughput compared to v0.1.0.
- macOS on Apple Silicon
Using the install script, the following will be installed under the ~/.venv-vllm-metal directory (the default).
- vllm-metal plugin
- vllm core
- Related libraries
If you run source ~/.venv-vllm-metal/bin/activate, the vllm CLI becomes available and you can access the vLLM right away.
For how to use the vllm CLI, please refer to the official vLLM guide.
https://docs.vllm.ai/en/latest/cli/
curl -fsSL https://raw.githubusercontent.com/vllm-project/vllm-metal/main/install.sh | bash