Skip to content

vllm-project/vllm-metal

vLLM Metal Plugin

High-performance LLM inference on Apple Silicon using MLX and vLLM

vLLM Metal is a plugin that enables vLLM to run on Apple Silicon Macs using MLX as the primary compute backend. It unifies MLX and PyTorch under a single lowering path.

Documentation: https://docs.vllm.ai/projects/vllm-metal/en/latest/


Latest News 🔥

  • [2026/04] We released the new version v0.2.0! Unified paged varlen Metal kernel is now the default attention backend. 83x TTFT, 3.6x throughput compared to v0.1.0.

Requirements

  • macOS on Apple Silicon

Installation

Using the install script, the following will be installed under the ~/.venv-vllm-metal directory (the default).

  • vllm-metal plugin
  • vllm core
  • Related libraries

If you run source ~/.venv-vllm-metal/bin/activate, the vllm CLI becomes available and you can access the vLLM right away.

For how to use the vllm CLI, please refer to the official vLLM guide. https://docs.vllm.ai/en/latest/cli/

curl -fsSL https://raw.githubusercontent.com/vllm-project/vllm-metal/main/install.sh | bash

About

Community maintained hardware plugin for vLLM on Apple Silicon

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors