Skip to content

Comments

fix: MLLM vision models hallucinate and ignore instructions in BatchedEngine#54

Open
janhilgard wants to merge 1 commit intowaybarrios:mainfrom
janhilgard:fix/mllm-vision-hallucination
Open

fix: MLLM vision models hallucinate and ignore instructions in BatchedEngine#54
janhilgard wants to merge 1 commit intowaybarrios:mainfrom
janhilgard:fix/mllm-vision-hallucination

Conversation

@janhilgard
Copy link
Collaborator

Summary

  • Fix MLLM models (Qwen3-VL, etc.) ignoring system messages, hallucinating random content, and not stopping at EOS in BatchedEngine (continuous batching) mode
  • Four root causes identified and fixed across chat template, EOS handling, sampling, and KV cache propagation

Problem

When running vision models via BatchedEngine with --continuous-batching, the model would:

  1. Ignore system messages ("Return only JSON") and generate random code/text
  2. Not stop generating (missing EOS token detection)
  3. Hallucinate unrelated content instead of describing the image
  4. Work correctly only in SimpleEngine mode

Root Causes & Fixes

1. Chat template dropped system messages and multi-turn context

_apply_chat_template() used mlx_vlm.apply_chat_template() which extracted only the last user message's text. All system prompts, formatting instructions, and conversation history were lost.

Fix: Use tokenizer.apply_chat_template() with the full message structure, preserving system messages, multi-turn history, and image placeholders.

2. Qwen3 EOS token fix missing for MLLM path

The eos_token = "<|im_end|>" fix existed in _start_llm() but was absent from _start_mllm(). Without the correct EOS token, the model wouldn't stop generating.

Fix: Apply the same Qwen3 EOS token fix in _start_mllm().

3. top_p not forwarded in SimpleEngine MLLM path

chat() and stream_chat() in MLXMultimodalLM lacked a top_p parameter, and SimpleEngine didn't pass it for MLLM branches.

Fix: Add top_p parameter to MLLM chat()/stream_chat() signatures and forward it through to mlx_vlm.generate()/stream_generate().

4. BatchedEngine KV cache not populated during vision encoding (critical)

_run_vision_encoding() called self.model(input_ids, **kwargs) without a cache parameter. The KV states from the VLM forward pass were discarded. Subsequent generation used an empty BatchKVCache, so the model had no context from the prompt or image — causing pure hallucination.

Additionally, input_ids from prepare_inputs had shape (1, N), and tolist() on it returned [[...]], making len(ids) == 1 instead of N, which broke padding calculations.

Fix:

  • Create per-request KVCache and pass it to self.model(input_ids, cache=per_cache, **kwargs)
  • After vision encoding all requests, merge per-request caches via BatchKVCache.merge()
  • Squeeze 2D input_ids to 1D for correct length computation

Files changed

File Changes
vllm_mlx/engine/batched.py Chat template rewrite + Qwen3 EOS fix for MLLM
vllm_mlx/engine/simple.py Forward top_p in MLLM chat/stream branches
vllm_mlx/models/mllm.py Add top_p to chat() and stream_chat()
vllm_mlx/mllm_batch_generator.py KV cache propagation + input_ids shape fix

Test plan

  • pytest tests/test_mllm.py — 15/15 passed
  • uvx black — all files formatted
  • BatchedEngine: red solid image → "red" (was random hallucination)
  • BatchedEngine: blue solid image → "blue"
  • BatchedEngine: system message "Return JSON" → {"color": "red"}
  • BatchedEngine: OCR image with "Hello World" → "Hello World"
  • SimpleEngine: same tests pass
  • EOS: finish_reason: "stop" (was "length")

🤖 Generated with Claude Code

@janhilgard janhilgard force-pushed the fix/mllm-vision-hallucination branch from 7047ed6 to 7eb93a3 Compare February 13, 2026 08:43
SimpleEngine MLLM paths and mllm.py chat()/stream_chat() methods
were missing top_p forwarding, causing generation to always use
the default value instead of the user-specified one.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@janhilgard janhilgard force-pushed the fix/mllm-vision-hallucination branch from 7eb93a3 to ef1f3e9 Compare February 13, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant