Skip to content

clip : refactor set input for cgraph + fix qwen2.5vl input #13136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 28, 2025

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Apr 27, 2025

Refactor clip_image_encode input, using a more fail-proof mechanism. Basically, this refactoring flatten complicated nested if...else conditions into a switch statement.

This also fixes the issue described in #12402 (comment)

Test results:

OK:   llama-mtmd-cli ggml-org/SmolVLM-500M-Instruct-GGUF:Q8_0
OK:   llama-mtmd-cli ggml-org/SmolVLM2-2.2B-Instruct-GGUF:Q4_K_M
OK:   llama-mtmd-cli ggml-org/SmolVLM2-500M-Video-Instruct-GGUF:Q8_0
OK:   llama-mtmd-cli ggml-org/gemma-3-4b-it-GGUF:Q4_K_M
OK:   llama-mtmd-cli guinmoon/MobileVLM-3B-GGUF:Q4_K_M
OK:   llama-mtmd-cli THUDM/glm-edge-v-5b-gguf:Q4_K_M
OK:   llama-mtmd-cli second-state/Llava-v1.5-7B-GGUF:Q2_K
OK:   llama-mtmd-cli cjpais/llava-1.6-mistral-7b-gguf:Q3_K
OK:   llama-mtmd-cli ibm-research/granite-vision-3.2-2b-GGUF:Q4_K_M
OK:   llama-mtmd-cli second-state/MiniCPM-Llama3-V-2_5-GGUF:Q2_K
OK:   llama-mtmd-cli openbmb/MiniCPM-V-2_6-gguf:Q2_K
OK:   llama-mtmd-cli openbmb/MiniCPM-o-2_6-gguf:Q4_0
OK:   llama-qwen2vl-cli bartowski/Qwen2-VL-2B-Instruct-GGUF:Q4_K_M
OK:   llama-qwen2vl-cli ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:Q4_K_M

@ngxson ngxson requested a review from ggerganov April 27, 2025 15:01

auto set_input_f32 = [&get_inp_tensor](const char * name, std::vector<float> & values) {
ggml_tensor * cur = get_inp_tensor(name);
GGML_ASSERT(ggml_nelements(cur) == (int64_t)values.size());
Copy link
Collaborator Author

@ngxson ngxson Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this check will prevent mistakes like the one from pixtral model, where the size of input std::vector is bigger than the size of the tensor

@ngxson ngxson changed the title clip : refactor set input for cgraph clip : refactor set input for cgraph + fix qwen2.5vl input Apr 27, 2025
@ngxson ngxson merged commit 5fa9e63 into ggml-org:master Apr 28, 2025
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants