Skip to content

Gemma-4-E4B-it fails to load #299

@WindChimeRan

Description

@WindChimeRan

Summary

Loading google/gemma-4-E4B-it (or mlx-community/gemma-4-e4b-it-bf16) crashes with ValueError: Received 54 parameters not in model: language_model.model.layers.24.self_attn.k_norm.weight, … for every language_model.model.layers.{24..41}.self_attn.{k_norm,k_proj,v_proj}.weight.

54 = 18 layers × 3 weight names. 18 matches text_config.num_kv_shared_layers: 18 in the HF config — these are KV-sharing layers that should reuse K/V projections from an earlier layer rather than own their own.

The HF safetensors still ship those K/V/k_norm tensors (apparently for round-trip safety), but vllm-metal's Gemma4 class — which has KV-sharing wired up — has no weight attributes for those slots, so mlx.nn.layers.base.load_weights(strict=True) rejects them.

The recently-merged multimodal/text-backbone fix is applying correctly (Metal: forcing text-only backbone for model_type=gemma4 (multimodal_mode=auto, cleared multimodal_config)); the failure is unrelated.

Environment

  • vllm-metal: 0.2.0 (release tag v0.2.0-20260424-074018, commit acd70f84, installed via install.sh from main)
  • mlx-lm: 0.31.3
  • macOS 25.4.0, Apple Silicon (M-series, 64 GB unified)
  • Model: google/gemma-4-E4B-it (also reproed on mlx-community/gemma-4-e4b-it-bf16)
  • Architecture (per config.jsontext_config): num_hidden_layers: 42, num_kv_shared_layers: 18, head_dim: 256, global_head_dim: 512

Repro

# Install vllm-metal latest
curl -fsSL https://raw.githubusercontent.com/vllm-project/vllm-metal/main/install.sh | bash
source ~/.venv-vllm-metal/bin/activate

# Get the model
hf download google/gemma-4-E4B-it --local-dir /path/to/Gemma-4-E4B-it

# Serve
vllm serve /path/to/Gemma-4-E4B-it --port 8004

Verbatim error log

(APIServer pid=11396) INFO 04-25 01:50:24 [model.py:549] Resolved architecture: Gemma4ForConditionalGeneration
(APIServer pid=11396) INFO 04-25 01:50:24 [model.py:1678] Using max model len 4096
(APIServer pid=11396) INFO 04-25 01:50:24 [config.py:104] Gemma4 model has heterogeneous head dimensions (head_dim=256, global_head_dim=512). Forcing TRITON_ATTN backend to prevent mixed-backend numerical divergence.
(APIServer pid=11396) INFO 04-25 01:50:24 [model_adapter.py:156] Metal: forcing text-only backbone for model_type=gemma4 (multimodal_mode=auto, cleared multimodal_config)
(EngineCore pid=11417) INFO 04-25 01:50:30 [model_lifecycle.py:168] Loading model: /path/to/Gemma-4-E4B-it (VLM: False)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108] EngineCore failed to start.
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108] Traceback (most recent call last):
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     super().__init__(...)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 114, in __init__
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     self.model_executor = executor_class(vllm_config)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 103, in __init__
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     self._init_executor()
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     self.driver_worker.load_model()
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm_metal/v1/worker.py", line 141, in load_model
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     self.model_runner.load_model()
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm_metal/v1/model_runner.py", line 351, in load_model
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     self._model_lifecycle.load()
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm_metal/v1/model_lifecycle.py", line 149, in load
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     model, tokenizer = self._load_generation_model(model_name, is_vlm)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm_metal/v1/model_lifecycle.py", line 191, in _load_generation_model
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     model, tokenizer = mlx_lm_load(...)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/mlx_lm/utils.py", line 491, in load
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     model, config = load_model(model_path, lazy, model_config=model_config)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/mlx_lm/utils.py", line 415, in load_model
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     model.load_weights(list(weights.items()), strict=strict)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/mlx/nn/layers/base.py", line 185, in load_weights
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     raise ValueError(
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108] ValueError: Received 54 parameters not in model:
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108] language_model.model.layers.24.self_attn.k_norm.weight,
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108] language_model.model.layers.24.self_attn.k_proj.weight,
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108] language_model.model.layers.24.self_attn.v_proj.weight,
[... continues for layers 25 through 41 — full list available, omitted for brevity ...]
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108] language_model.model.layers.41.self_attn.v_proj.weight.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions