clip : refactor set input for cgraph + fix qwen2.5vl input #13136

ngxson · 2025-04-27T15:01:02Z

Refactor clip_image_encode input, using a more fail-proof mechanism. Basically, this refactoring flatten complicated nested if...else conditions into a switch statement.

This also fixes the issue described in #12402 (comment)

Test results:

OK:   llama-mtmd-cli ggml-org/SmolVLM-500M-Instruct-GGUF:Q8_0
OK:   llama-mtmd-cli ggml-org/SmolVLM2-2.2B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/SmolVLM2-500M-Video-Instruct-GGUF:Q8_0
OK:   llama-mtmd-cli ggml-org/gemma-3-4b-it-GGUF:Q4_K_M
OK:   llama-mtmd-cli guinmoon/MobileVLM-3B-GGUF:Q4_K_M
OK:   llama-mtmd-cli THUDM/glm-edge-v-5b-gguf:Q4_K_M
OK:   llama-mtmd-cli second-state/Llava-v1.5-7B-GGUF:Q2_K
OK:   llama-mtmd-cli cjpais/llava-1.6-mistral-7b-gguf:Q3_K
OK:   llama-mtmd-cli ibm-research/granite-vision-3.2-2b-GGUF:Q4_K_M
OK:   llama-mtmd-cli second-state/MiniCPM-Llama3-V-2_5-GGUF:Q2_K
OK:   llama-mtmd-cli openbmb/MiniCPM-V-2_6-gguf:Q2_K
OK:   llama-mtmd-cli openbmb/MiniCPM-o-2_6-gguf:Q4_0
OK:   llama-qwen2vl-cli bartowski/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M
OK:   llama-qwen2vl-cli ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:Q4_K_M

ngxson · 2025-04-27T15:03:38Z

examples/llava/clip.cpp

+
+    auto set_input_f32 = [&get_inp_tensor](const char * name, std::vector<float> & values) {
+        ggml_tensor * cur = get_inp_tensor(name);
+        GGML_ASSERT(ggml_nelements(cur) == (int64_t)values.size());


Having this check will prevent mistakes like the one from pixtral model, where the size of input std::vector is bigger than the size of the tensor

…erywhere

clip : refactor set input for cgraph

2b8460b

ngxson requested a review from ggerganov April 27, 2025 15:01

github-actions bot added the examples label Apr 27, 2025

ngxson commented Apr 27, 2025

View reviewed changes

ngxson added 3 commits April 27, 2025 17:10

more strict assert

6bac190

minicpmv : use clip_n_mmproj_embd instead of copying the same code ev…

ea24eb2

…erywhere

split qwen2 and qwen2.5 code blocks

a264a25

ngxson changed the title ~~clip : refactor set input for cgraph~~ clip : refactor set input for cgraph + fix qwen2.5vl input Apr 27, 2025

ngxson mentioned this pull request Apr 27, 2025

mtmd : add qwen2vl and qwen2.5vl #13141

Open

ggerganov approved these changes Apr 28, 2025

View reviewed changes

minor style fix

02195d5

ngxson merged commit 5fa9e63 into ggml-org:master Apr 28, 2025
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clip : refactor set input for cgraph + fix qwen2.5vl input #13136

clip : refactor set input for cgraph + fix qwen2.5vl input #13136

ngxson commented Apr 27, 2025 •

edited

Loading

ngxson Apr 27, 2025 •

edited

Loading

clip : refactor set input for cgraph + fix qwen2.5vl input #13136

clip : refactor set input for cgraph + fix qwen2.5vl input #13136

Conversation

ngxson commented Apr 27, 2025 • edited Loading

ngxson Apr 27, 2025 • edited Loading

Choose a reason for hiding this comment

ngxson commented Apr 27, 2025 •

edited

Loading

ngxson Apr 27, 2025 •

edited

Loading