Skip to content

RuntimeError: memory access out of bounds loading gemma-4-E2B-it-web.task on Chrome 146 / macOS / Apple M4 #6270

@stharrold

Description

@stharrold

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No — using stock LlmInference.createFromOptions() with the pre-built model from litert-community/gemma-4-E2B-it-litert-lm.

OS Platform and Distribution

macOS 15 (Sequoia), Darwin 25.3.0, Apple M4 (10 cores), 16 GB unified memory

Mobile device if the issue happens on mobile device

No response

Browser and version if the issue happens on browser

Google Chrome 146.0.0.0

Programming Language and version

JavaScript (ES Modules, browser)

MediaPipe version

@mediapipe/tasks-genai@0.10.26 (also reproduced on 0.10.27)

Bazel version

No response

Solution

LLM Inference (GenAI)

Android Studio, NDK, SDK versions (if issue is related to building in Android environment)

No response

Xcode & Tulsi version (if issue is related to building for iOS)

No response

Describe the actual behavior

LlmInference.createFromOptions() crashes with RuntimeError: memory access out of bounds in the WASM module approximately 43-63 seconds after the model graph starts. The model file (gemma-4-E2B-it-web.task, 1.93 GB, from litert-community/gemma-4-E2B-it-litert-lm) downloads successfully, the MediaPipe graph starts, LlmConfig is parsed correctly (model_type: 52, stack_size: 35, vocabulary_size: 262144), and Subgroups Enabled! is logged. The crash then occurs deterministically at wasm-function[489]:0x43b55 during what appears to be GPU program build.

The WASM crash is an uncaught exception — the createFromOptions promise never resolves or rejects. A try/catch around the call does not catch it.

This is NOT an out-of-memory condition: at crash time, performance.memory.usedJSHeapSize is 20 MB (of 4 GB limit), and the device has 16 GB total RAM with WebGPU adapter available.

Describe the expected behaviour

LlmInference.createFromOptions() should resolve with a usable LlmInference instance. The HF model card for litert-community/gemma-4-E2B-it-litert-lm benchmarks this model running on MacBook Pro M4 Max via Chrome WebGPU.

Standalone code/steps you may have used to try to get what you need

Minimal reproduction (2 files, verified to reproduce):

server.py — serves with COOP/COEP headers required for SharedArrayBuffer/WASM threading:

#!/usr/bin/env python3
"""Serve with COOP/COEP headers for WebGPU + SharedArrayBuffer."""
import http.server

class COEPHandler(http.server.SimpleHTTPRequestHandler):
    def end_headers(self):
        self.send_header("Cross-Origin-Opener-Policy", "same-origin")
        self.send_header("Cross-Origin-Embedder-Policy", "credentialless")
        super().end_headers()

http.server.HTTPServer(("", 8080), COEPHandler).serve_forever()

index.html:

<!DOCTYPE html>
<html>
<head><title>Gemma 4 E2B MediaPipe Test</title></head>
<body>
<h1>Gemma 4 E2B MediaPipe Test</h1>
<pre id="log">Loading MediaPipe...</pre>
<script type="module">
const log = document.getElementById('log');
function print(msg) { log.textContent += '\n' + msg; console.log(msg); }

try {
  // CDN import works under COEP "credentialless" (verified)
  const { FilesetResolver, LlmInference } = await import(
    'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai@0.10.26/genai_bundle.mjs'
  );
  print('MediaPipe imported from CDN.');

  print('Resolving WASM fileset...');
  const fileset = await FilesetResolver.forGenAiTasks(
    'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai@0.10.26/wasm'
  );
  print('Fileset ready.');

  print('deviceMemory: ' + navigator.deviceMemory + ' GB');
  print('WebGPU adapter: ' + !!(await navigator.gpu?.requestAdapter()));
  print('Creating LlmInference (downloads ~1.93 GB, then builds GPU program)...');

  const llm = await LlmInference.createFromOptions(fileset, {
    baseOptions: {
      modelAssetPath: 'https://huggingface.co/litert-community/gemma-4-E2B-it-litert-lm/resolve/main/gemma-4-E2B-it-web.task'
    },
    maxTokens: 512,
    topK: 40,
    temperature: 0.1,
    randomSeed: 42,
  });
  print('SUCCESS: Model loaded!');
  const response = await llm.generateResponse('What is GDPR?');
  print('Response: ' + response);
} catch (e) {
  // NOTE: The WASM crash does NOT reach this catch block.
  // The promise never resolves or rejects — the WASM exception is uncaught.
  print('FAILED: ' + e.message);
  print('Stack: ' + e.stack);
}
</script>
</body>
</html>

Steps to reproduce:

# 1. Save both files to a directory
# 2. Start the server
python3 server.py

# 3. Open Chrome 146 on macOS with Apple M4
# 4. Open DevTools Console (Cmd+Option+I -> Console tab)
# 5. Navigate to http://localhost:8080
# 6. Wait ~3-5 min total (download + GPU program build)

Where to look for the result:

  • On the page (<pre id="log">): shows SUCCESS: Model loaded! on success, or stays stuck at Creating LlmInference... on crash (the catch block never fires because the WASM exception is uncaught)
  • In DevTools Console: the crash appears as a red Uncaught (in promise) RuntimeError: memory access out of bounds from genai_wasm_internal.wasm
  • Console timeline: Graph successfully started running. appears when download completes and GPU program build begins. The WASM exception follows ~43s later.

Tested mitigations (none resolved the crash):

  • Chrome flag #enable-unsafe-webgpu: Enabled
  • Chrome flag #enable-webgpu-developer-features: Enabled
  • GPU hardware acceleration: Enabled in chrome://settings/system
  • Reduced maxTokens to 64 (ignored; cache_size: 512 is embedded in .task bundle)
  • Upgraded from @mediapipe/tasks-genai@0.10.26 to @0.10.27
  • Used LlmInference.createFromModelBuffer() with ReadableStreamDefaultReader instead of modelAssetPath — same crash at same address

Other info / Complete Logs

Console output from standalone repro (verified to reproduce):

Graph successfully started running.
I0409 13:48:37.737999 2106368 llm_gpu_calculator.cc:389] LlmConfig:  model_type: 52 text decoder: (batch_size: 1 sequence_size: 0 model_dimension: 1536 hidden_dimension: 6144 head_dimension: 256 number_of_heads: 8 number_of_kv_heads: 1 vocabulary_size: 262144 stack_size: 35 attention_mask_type: 1 lora_rank:  max_top_k: 40 cache_size: 512 post_norm_ff: true post_norm_attention: true attention_scale_type: 5 skip_absolute_positional_embeddings: true use_optional_transformer_layers: false selectable_final_ln: std::nullopt num_local_layers_per_global: 4 sliding_window_size: 512 global_rope_wavelength: 1000000.000000 global_rope_scaling: std::nullopt qk_norm: true ngram_vocabulary_size: std::nullopt load_ngrammer_embedding_dynamically: false per_layer_embedding_dimension: 256 load_per_layer_embedding_dynamically: true kv_cache_shared_start_layer: 15 activation_sparsity_stddev_multiplier: std::nullopt residual_adapter_interleave: std::nullopt residual_adapter_bottleneck_dimension: std::nullopt vision_tokens_num: 281 max_num_images: 0 audio_vocab_offset: 518272 audio_vocab_extra_dim: 128 audio_input_embedding_dim: 1536 audio_dual_norm_soft_and_hard_tokens: true bfloat16 fix: false int8_kv_cache: true apply_srq: true layer_skip_start: 0 num_layers_skipped: 0 matform_factor: 0 enable_external_embeddings: false)
I0409 13:48:37.737999 2106368 environment.cc:330] Subgroups Enabled!
Uncaught (in promise) RuntimeError: memory access out of bounds
    at wasm-function[489]:0x43b55
    at wasm-function[7691]:0xa3d6d7
    at wasm-function[1511]:0x13bac4
    at wasm-function[7684]:0xa3755d
    at wasm-function[3667]:0x43c9e0
    at wasm-function[15976]:0x122c1e5
    at wasm-function[16953]:0x13ab534
    at wasm-function[13767]:0x1079803
    at wasm-function[5041]:0x63a57c
    at wasm-function[1422]:0x122233
    at wasm-function[5168]:0x6a390c
    at wasm-function[10382]:0xeb5407
    at wrapper (genai_wasm_internal.js:1:78622)
    at Object.doRewind (genai_wasm_internal.js:1:80477)
    at genai_wasm_internal.js:1:81035

Diagnostics at crash time:

  • performance.memory.jsHeapSizeLimit: 4.00 GB
  • performance.memory.totalJSHeapSize: 0.029 GB
  • performance.memory.usedJSHeapSize: 0.027 GB
  • navigator.deviceMemory: 16
  • navigator.hardwareConcurrency: 10
  • navigator.gpu.requestAdapter(): returns valid adapter
  • self.crossOriginIsolated: true

Note: performance.memory measures JS heap only. WASM linear memory is separate and not measured by this API. The crash could potentially be WASM linear memory exhaustion, but the deterministic crash address ($func489:0x43b55) across all attempts suggests a code bug rather than a resource limit.

chrome://gpu summary (full output attached as about-gpu.txt):

  • GPU: Apple M4
  • Backend: Metal
  • WebGPU: Enabled (after turning on GPU acceleration in chrome://settings/system)
  • GL implementation: disabled (Metal only)
  • WGSL subgroups: Enabled

What we tested:

Test Method Result
modelAssetPath (URL) createFromOptions({baseOptions: {modelAssetPath: url}}) Crash at $func489:0x43b55
modelAssetBuffer (stream) createFromModelBuffer(fileset, reader) Same crash at same address
Standalone HTML repro server.py + index.html + CDN imports Same crash (verified)
@mediapipe/tasks-genai@0.10.26 npm version Crash
@mediapipe/tasks-genai@0.10.27 npm version Same crash
maxTokens: 64 Reduced from 512 Ignored (cache_size baked into .task), same crash
#enable-unsafe-webgpu flag Chrome flag No effect
Gemma 3 270m control gemma3-270m-it-q8-web.task (263 MB) same device, same code path SUCCESS — loads, generates, closes cleanly

The Gemma 3 270m control test confirms the MediaPipe WASM runtime, WebGPU adapter, and Metal shader compilation all work correctly on this M4 device. The crash is specific to the Gemma 4 E2B model.

Related issues searched (via gh search issues --repo google-ai-edge/mediapipe):

Searched terms: "memory access out of bounds", "LlmInference", "gemma-4 web", "gemma E2B web.task", "llm_gpu_calculator", "shader compilation", "Metal Apple", "web.task crash". No existing issue matches this combination.

Question:

Is gemma-4-E2B-it-web.task expected to work on Apple M4 (non-Max, 10 GPU cores) via Chrome WebGPU? The model card only benchmarks M4 Max (which has 40 GPU cores and more unified memory bandwidth). If M4 non-Max is not a supported target for this model size, could this be documented on the model card?

Metadata

Metadata

Labels

os:macOSIssues on MacOSplatform:javascriptMediaPipe Javascript issuesplatform:webweb relatedtask:LLM inferenceIssues related to MediaPipe LLM Inference Gen AI setuptype:bugBug in the Source Code of MediaPipe Solution

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions