`RuntimeError: memory access out of bounds` loading gemma-4-E2B-it-web.task on Chrome 146 / macOS / Apple M4

### Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No — using stock `LlmInference.createFromOptions()` with the pre-built model from `litert-community/gemma-4-E2B-it-litert-lm`.

### OS Platform and Distribution

macOS 15 (Sequoia), Darwin 25.3.0, Apple M4 (10 cores), 16 GB unified memory

### Mobile device if the issue happens on mobile device

_No response_

### Browser and version if the issue happens on browser

Google Chrome 146.0.0.0

### Programming Language and version

JavaScript (ES Modules, browser)

### MediaPipe version

@mediapipe/tasks-genai@0.10.26 (also reproduced on 0.10.27)

### Bazel version

_No response_

### Solution

LLM Inference (GenAI)

### Android Studio, NDK, SDK versions (if issue is related to building in Android environment)

_No response_

### Xcode & Tulsi version (if issue is related to building for iOS)

_No response_

### Describe the actual behavior

`LlmInference.createFromOptions()` crashes with `RuntimeError: memory access out of bounds` in the WASM module approximately 43-63 seconds after the model graph starts. The model file (`gemma-4-E2B-it-web.task`, 1.93 GB, from `litert-community/gemma-4-E2B-it-litert-lm`) downloads successfully, the MediaPipe graph starts, `LlmConfig` is parsed correctly (model_type: 52, stack_size: 35, vocabulary_size: 262144), and `Subgroups Enabled!` is logged. The crash then occurs deterministically at `wasm-function[489]:0x43b55` during what appears to be GPU program build.

The WASM crash is an **uncaught exception** — the `createFromOptions` promise never resolves or rejects. A `try/catch` around the call does not catch it.

This is NOT an out-of-memory condition: at crash time, `performance.memory.usedJSHeapSize` is 20 MB (of 4 GB limit), and the device has 16 GB total RAM with WebGPU adapter available.

### Describe the expected behaviour

`LlmInference.createFromOptions()` should resolve with a usable LlmInference instance. The HF model card for `litert-community/gemma-4-E2B-it-litert-lm` benchmarks this model running on MacBook Pro M4 Max via Chrome WebGPU.

### Standalone code/steps you may have used to try to get what you need

Minimal reproduction (2 files, verified to reproduce):

**`server.py`** — serves with COOP/COEP headers required for SharedArrayBuffer/WASM threading:

```python
#!/usr/bin/env python3
"""Serve with COOP/COEP headers for WebGPU + SharedArrayBuffer."""
import http.server

class COEPHandler(http.server.SimpleHTTPRequestHandler):
    def end_headers(self):
        self.send_header("Cross-Origin-Opener-Policy", "same-origin")
        self.send_header("Cross-Origin-Embedder-Policy", "credentialless")
        super().end_headers()

http.server.HTTPServer(("", 8080), COEPHandler).serve_forever()
```

**`index.html`**:

```html
<!DOCTYPE html>
<html>
<head><title>Gemma 4 E2B MediaPipe Test</title></head>
<body>
<h1>Gemma 4 E2B MediaPipe Test</h1>
<pre id="log">Loading MediaPipe...</pre>
<script type="module">
const log = document.getElementById('log');
function print(msg) { log.textContent += '\n' + msg; console.log(msg); }

try {
  // CDN import works under COEP "credentialless" (verified)
  const { FilesetResolver, LlmInference } = await import(
    'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai@0.10.26/genai_bundle.mjs'
  );
  print('MediaPipe imported from CDN.');

  print('Resolving WASM fileset...');
  const fileset = await FilesetResolver.forGenAiTasks(
    'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai@0.10.26/wasm'
  );
  print('Fileset ready.');

  print('deviceMemory: ' + navigator.deviceMemory + ' GB');
  print('WebGPU adapter: ' + !!(await navigator.gpu?.requestAdapter()));
  print('Creating LlmInference (downloads ~1.93 GB, then builds GPU program)...');

  const llm = await LlmInference.createFromOptions(fileset, {
    baseOptions: {
      modelAssetPath: 'https://huggingface.co/litert-community/gemma-4-E2B-it-litert-lm/resolve/main/gemma-4-E2B-it-web.task'
    },
    maxTokens: 512,
    topK: 40,
    temperature: 0.1,
    randomSeed: 42,
  });
  print('SUCCESS: Model loaded!');
  const response = await llm.generateResponse('What is GDPR?');
  print('Response: ' + response);
} catch (e) {
  // NOTE: The WASM crash does NOT reach this catch block.
  // The promise never resolves or rejects — the WASM exception is uncaught.
  print('FAILED: ' + e.message);
  print('Stack: ' + e.stack);
}
</script>
</body>
</html>
```

**Steps to reproduce:**

```bash
# 1. Save both files to a directory
# 2. Start the server
python3 server.py

# 3. Open Chrome 146 on macOS with Apple M4
# 4. Open DevTools Console (Cmd+Option+I -> Console tab)
# 5. Navigate to http://localhost:8080
# 6. Wait ~3-5 min total (download + GPU program build)
```

**Where to look for the result:**

- **On the page** (`<pre id="log">`): shows `SUCCESS: Model loaded!` on success, or stays stuck at `Creating LlmInference...` on crash (the `catch` block never fires because the WASM exception is uncaught)
- **In DevTools Console**: the crash appears as a red `Uncaught (in promise) RuntimeError: memory access out of bounds` from `genai_wasm_internal.wasm`
- **Console timeline**: `Graph successfully started running.` appears when download completes and GPU program build begins. The WASM exception follows ~43s later.

**Tested mitigations (none resolved the crash):**
- Chrome flag `#enable-unsafe-webgpu`: Enabled
- Chrome flag `#enable-webgpu-developer-features`: Enabled
- GPU hardware acceleration: Enabled in chrome://settings/system
- Reduced `maxTokens` to 64 (ignored; `cache_size: 512` is embedded in .task bundle)
- Upgraded from `@mediapipe/tasks-genai@0.10.26` to `@0.10.27`
- Used `LlmInference.createFromModelBuffer()` with `ReadableStreamDefaultReader` instead of `modelAssetPath` — same crash at same address

### Other info / Complete Logs

**Console output from standalone repro** (verified to reproduce):

```
Graph successfully started running.
I0409 13:48:37.737999 2106368 llm_gpu_calculator.cc:389] LlmConfig:  model_type: 52 text decoder: (batch_size: 1 sequence_size: 0 model_dimension: 1536 hidden_dimension: 6144 head_dimension: 256 number_of_heads: 8 number_of_kv_heads: 1 vocabulary_size: 262144 stack_size: 35 attention_mask_type: 1 lora_rank:  max_top_k: 40 cache_size: 512 post_norm_ff: true post_norm_attention: true attention_scale_type: 5 skip_absolute_positional_embeddings: true use_optional_transformer_layers: false selectable_final_ln: std::nullopt num_local_layers_per_global: 4 sliding_window_size: 512 global_rope_wavelength: 1000000.000000 global_rope_scaling: std::nullopt qk_norm: true ngram_vocabulary_size: std::nullopt load_ngrammer_embedding_dynamically: false per_layer_embedding_dimension: 256 load_per_layer_embedding_dynamically: true kv_cache_shared_start_layer: 15 activation_sparsity_stddev_multiplier: std::nullopt residual_adapter_interleave: std::nullopt residual_adapter_bottleneck_dimension: std::nullopt vision_tokens_num: 281 max_num_images: 0 audio_vocab_offset: 518272 audio_vocab_extra_dim: 128 audio_input_embedding_dim: 1536 audio_dual_norm_soft_and_hard_tokens: true bfloat16 fix: false int8_kv_cache: true apply_srq: true layer_skip_start: 0 num_layers_skipped: 0 matform_factor: 0 enable_external_embeddings: false)
I0409 13:48:37.737999 2106368 environment.cc:330] Subgroups Enabled!
Uncaught (in promise) RuntimeError: memory access out of bounds
    at wasm-function[489]:0x43b55
    at wasm-function[7691]:0xa3d6d7
    at wasm-function[1511]:0x13bac4
    at wasm-function[7684]:0xa3755d
    at wasm-function[3667]:0x43c9e0
    at wasm-function[15976]:0x122c1e5
    at wasm-function[16953]:0x13ab534
    at wasm-function[13767]:0x1079803
    at wasm-function[5041]:0x63a57c
    at wasm-function[1422]:0x122233
    at wasm-function[5168]:0x6a390c
    at wasm-function[10382]:0xeb5407
    at wrapper (genai_wasm_internal.js:1:78622)
    at Object.doRewind (genai_wasm_internal.js:1:80477)
    at genai_wasm_internal.js:1:81035
```

**Diagnostics at crash time:**
- `performance.memory.jsHeapSizeLimit`: 4.00 GB
- `performance.memory.totalJSHeapSize`: 0.029 GB
- `performance.memory.usedJSHeapSize`: 0.027 GB
- `navigator.deviceMemory`: 16
- `navigator.hardwareConcurrency`: 10
- `navigator.gpu.requestAdapter()`: returns valid adapter
- `self.crossOriginIsolated`: true

Note: `performance.memory` measures JS heap only. WASM linear memory is separate and not measured by this API. The crash could potentially be WASM linear memory exhaustion, but the deterministic crash address (`$func489:0x43b55`) across all attempts suggests a code bug rather than a resource limit.

**`chrome://gpu` summary** (full output attached as `about-gpu.txt`):
- GPU: Apple M4
- Backend: Metal
- WebGPU: Enabled (after turning on GPU acceleration in chrome://settings/system)
- GL implementation: disabled (Metal only)
- WGSL subgroups: Enabled

**What we tested:**

| Test | Method | Result |
|------|--------|--------|
| `modelAssetPath` (URL) | `createFromOptions({baseOptions: {modelAssetPath: url}})` | Crash at `$func489:0x43b55` |
| `modelAssetBuffer` (stream) | `createFromModelBuffer(fileset, reader)` | Same crash at same address |
| Standalone HTML repro | `server.py` + `index.html` + CDN imports | Same crash (verified) |
| `@mediapipe/tasks-genai@0.10.26` | npm version | Crash |
| `@mediapipe/tasks-genai@0.10.27` | npm version | Same crash |
| `maxTokens: 64` | Reduced from 512 | Ignored (cache_size baked into .task), same crash |
| `#enable-unsafe-webgpu` flag | Chrome flag | No effect |
| **Gemma 3 270m control** | `gemma3-270m-it-q8-web.task` (263 MB) same device, same code path | **SUCCESS** — loads, generates, closes cleanly |

The Gemma 3 270m control test confirms the MediaPipe WASM runtime, WebGPU adapter, and Metal shader compilation all work correctly on this M4 device. The crash is **specific to the Gemma 4 E2B model**.

**Related issues searched** (via `gh search issues --repo google-ai-edge/mediapipe`):
- #5976 (closest match) — Gemma 3n E2B fails with `RangeError: Array buffer allocation failed` on 8 GB devices. Different error type (ArrayBuffer vs WASM OOB), different model, different device RAM (8 GB vs 16 GB). But same symptom: `createFromOptions()` fails for Gemma E2B-class models.
- #5406 — `Failed to build program executable - Out of host memory` on Android Adreno GPU. Different platform but similar failure point (GPU program build).
- #6160 — `modelAssetPath` loading method crashes. Closed as stale without resolution. We tested `modelAssetBuffer` too — same crash, ruling out the loading method.
- #5740 — `LlmInference` model not closing (resource leak). Not directly related but same component.
- #6002 — Request to track progress of `createFromOptions`. Related: our progress wrapper (fetch intercept) doesn't fire because model download happens inside WASM, not via JS `fetch()`.

Searched terms: "memory access out of bounds", "LlmInference", "gemma-4 web", "gemma E2B web.task", "llm_gpu_calculator", "shader compilation", "Metal Apple", "web.task crash". No existing issue matches this combination.

**Question:**

Is `gemma-4-E2B-it-web.task` expected to work on Apple M4 (non-Max, 10 GPU cores) via Chrome WebGPU? The model card only benchmarks M4 Max (which has 40 GPU cores and more unified memory bandwidth). If M4 non-Max is not a supported target for this model size, could this be documented on the model card?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`RuntimeError: memory access out of bounds` loading gemma-4-E2B-it-web.task on Chrome 146 / macOS / Apple M4 #6270

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

Mobile device if the issue happens on mobile device

Browser and version if the issue happens on browser

Programming Language and version

MediaPipe version

Bazel version

Solution

Android Studio, NDK, SDK versions (if issue is related to building in Android environment)

Xcode & Tulsi version (if issue is related to building for iOS)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test	Method	Result
`modelAssetPath` (URL)	`createFromOptions({baseOptions: {modelAssetPath: url}})`	Crash at `$func489:0x43b55`
`modelAssetBuffer` (stream)	`createFromModelBuffer(fileset, reader)`	Same crash at same address
Standalone HTML repro	`server.py` + `index.html` + CDN imports	Same crash (verified)
`@mediapipe/tasks-genai@0.10.26`	npm version	Crash
`@mediapipe/tasks-genai@0.10.27`	npm version	Same crash
`maxTokens: 64`	Reduced from 512	Ignored (cache_size baked into .task), same crash
`#enable-unsafe-webgpu` flag	Chrome flag	No effect
Gemma 3 270m control	`gemma3-270m-it-q8-web.task` (263 MB) same device, same code path	SUCCESS — loads, generates, closes cleanly

RuntimeError: memory access out of bounds loading gemma-4-E2B-it-web.task on Chrome 146 / macOS / Apple M4 #6270

Description

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

Mobile device if the issue happens on mobile device

Browser and version if the issue happens on browser

Programming Language and version

MediaPipe version

Bazel version

Solution

Android Studio, NDK, SDK versions (if issue is related to building in Android environment)

Xcode & Tulsi version (if issue is related to building for iOS)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`RuntimeError: memory access out of bounds` loading gemma-4-E2B-it-web.task on Chrome 146 / macOS / Apple M4 #6270