Gemma-4-E4B-it fails to load


## Summary

Loading `google/gemma-4-E4B-it` (or `mlx-community/gemma-4-e4b-it-bf16`) crashes with `ValueError: Received 54 parameters not in model: language_model.model.layers.24.self_attn.k_norm.weight, …` for every `language_model.model.layers.{24..41}.self_attn.{k_norm,k_proj,v_proj}.weight`.

54 = 18 layers × 3 weight names. **18 matches `text_config.num_kv_shared_layers: 18`** in the HF config — these are KV-sharing layers that should reuse K/V projections from an earlier layer rather than own their own.

The HF safetensors still ship those K/V/k_norm tensors (apparently for round-trip safety), but vllm-metal's Gemma4 class — which has KV-sharing wired up — has no weight attributes for those slots, so `mlx.nn.layers.base.load_weights(strict=True)` rejects them.

The recently-merged multimodal/text-backbone fix is applying correctly (`Metal: forcing text-only backbone for model_type=gemma4 (multimodal_mode=auto, cleared multimodal_config)`); the failure is unrelated.

## Environment

- vllm-metal: `0.2.0` (release tag `v0.2.0-20260424-074018`, commit `acd70f84`, installed via `install.sh` from `main`)
- mlx-lm: 0.31.3
- macOS 25.4.0, Apple Silicon (M-series, 64 GB unified)
- Model: `google/gemma-4-E4B-it` (also reproed on `mlx-community/gemma-4-e4b-it-bf16`)
- Architecture (per `config.json` → `text_config`): `num_hidden_layers: 42`, `num_kv_shared_layers: 18`, `head_dim: 256`, `global_head_dim: 512`

## Repro

```bash
# Install vllm-metal latest
curl -fsSL https://raw.githubusercontent.com/vllm-project/vllm-metal/main/install.sh | bash
source ~/.venv-vllm-metal/bin/activate

# Get the model
hf download google/gemma-4-E4B-it --local-dir /path/to/Gemma-4-E4B-it

# Serve
vllm serve /path/to/Gemma-4-E4B-it --port 8004
```

## Verbatim error log

```
(APIServer pid=11396) INFO 04-25 01:50:24 [model.py:549] Resolved architecture: Gemma4ForConditionalGeneration
(APIServer pid=11396) INFO 04-25 01:50:24 [model.py:1678] Using max model len 4096
(APIServer pid=11396) INFO 04-25 01:50:24 [config.py:104] Gemma4 model has heterogeneous head dimensions (head_dim=256, global_head_dim=512). Forcing TRITON_ATTN backend to prevent mixed-backend numerical divergence.
(APIServer pid=11396) INFO 04-25 01:50:24 [model_adapter.py:156] Metal: forcing text-only backbone for model_type=gemma4 (multimodal_mode=auto, cleared multimodal_config)
(EngineCore pid=11417) INFO 04-25 01:50:30 [model_lifecycle.py:168] Loading model: /path/to/Gemma-4-E4B-it (VLM: False)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108] EngineCore failed to start.
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108] Traceback (most recent call last):
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     super().__init__(...)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 114, in __init__
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     self.model_executor = executor_class(vllm_config)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 103, in __init__
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     self._init_executor()
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 52, in _init_executor
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     self.driver_worker.load_model()
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm_metal/v1/worker.py", line 141, in load_model
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     self.model_runner.load_model()
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm_metal/v1/model_runner.py", line 351, in load_model
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     self._model_lifecycle.load()
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm_metal/v1/model_lifecycle.py", line 149, in load
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     model, tokenizer = self._load_generation_model(model_name, is_vlm)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/vllm_metal/v1/model_lifecycle.py", line 191, in _load_generation_model
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     model, tokenizer = mlx_lm_load(...)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/mlx_lm/utils.py", line 491, in load
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     model, config = load_model(model_path, lazy, model_config=model_config)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/mlx_lm/utils.py", line 415, in load_model
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     model.load_weights(list(weights.items()), strict=strict)
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]   File "/Users/<user>/.venv-vllm-metal/lib/python3.12/site-packages/mlx/nn/layers/base.py", line 185, in load_weights
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108]     raise ValueError(
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108] ValueError: Received 54 parameters not in model:
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108] language_model.model.layers.24.self_attn.k_norm.weight,
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108] language_model.model.layers.24.self_attn.k_proj.weight,
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108] language_model.model.layers.24.self_attn.v_proj.weight,
[... continues for layers 25 through 41 — full list available, omitted for brevity ...]
(EngineCore pid=11417) ERROR 04-25 01:50:30 [core.py:1108] language_model.model.layers.41.self_attn.v_proj.weight.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma-4-E4B-it fails to load #299

Summary

Environment

Repro

Verbatim error log

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gemma-4-E4B-it fails to load #299

Description

Summary

Environment

Repro

Verbatim error log

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions