Feat/mrope clean #1

iosub · 2025-12-05T09:37:08Z

No description provided.

This reverts commit 3a9e8e9.

This model lacks the metadata for the projector type.

…at aren't touched in this PR.

There were a few Markdown typos in one FAQ answer. It now renders as a proper ascii table.

The cuda_jetpack libs will enumerate discrete GPUs on SBSA systems which leads to runtime failures of missing kernels. This fix requires an exact match to enable jetpacks instead of relying on enumeration to filter out supported libraries.

While processing the response stream during a chat or generation if an error is occurred it is parsed and returned to the user. The issue with the existing code is that this assumed the response would be valid JSON, which is not a safe assumption and caused cryptic error messages to be displayed due to parsing failures: `invalid character 'i' looking for beginning of value` This change updates the stream function to return the raw error string if it cant be parsed as JSON. This should help with debugging issues by making sure the actual error reaches the user.

If the user has somehow installed another GGML based app which places a ggml-base lib somewhere in their PATH, we can experience runtime problems due to incompatibilities. This change adds a warning message if we detect a ggml-base outside of our install location to aid in troubleshooting.

This change: * fixes rope scaling in the mistral converter * updates ministral to include llama4 scaling * includes a new ministral parser for parsing reasoning and tool calling --------- Co-authored-by: jmorganca <[email protected]>

Model eviction happens when we have at least one other model loaded and are unable to load all layers into VRAM. However, on CPU-only systems we can never load layers into VRAM, so this constantly triggered eviction. Fixes ollama#13227

Avoid hitting test timeouts

Added Vulkan SDK installation instructions and environment variable setup for building with Vulkan support.

Add Multi-dimensional Rotary Position Embedding (M-RoPE) support for Qwen2-VL and Qwen3-VL vision-language models. Problem: Ollama set only 1 position per token, but Qwen3-VL's M-RoPE expects 4 positions with 2D spatial encoding for images. Changes: - llama/llama.go: NewBatchMRoPE(), AddImageMRoPE(), NEmbdInp(), UsesMRoPE() - runner/llamarunner/runner.go: M-RoPE batch handling, numTokens vs numPos - runner/llamarunner/image.go: BatchSize 8192 for M-RoPE models - runner/llamarunner/cache.go: Clear KV cache for image prompts - llama/patches/0032: Fix n_embd vs n_embd_inp for vision embeddings Tested with Qwen3-VL 2B and 8B split models.

We now do a deeper probe of CUDA devices to verify the library version has the correct compute capability coverage for the device. Due to ROCm also interpreting the CUDA env var to filter AMD devices, we try to avoid setting it which leads to problems in mixed vendor systems. However without setting it for this deeper probe, each CUDA library subprocess discovers all CUDA GPUs and on systems with lots of GPUs, this can lead to hitting timeouts. The fix is to turn on the CUDA visibility env var just for this deeper probe use-case.

This fixes a bug where disabling thinking on deepseek-v3.1 did not stop the model from thinking. When thinking is not defined it should not be sent to the server since this will cause error responses in some cases where the model does not support thinking. However if it is defined as false it should still be sent.

* Revert "vulkan: temporary cary of vulkan fixes (ollama#12971)" This reverts commit 3a9e8e9. * ggml update to b7087 * fix argsort on metal * update to b7108 * fix bakllava regression This model lacks the metadata for the projector type. * update to b7209 * fix TopK perf * only build arm code on arm

* cmd/bench: support writing benchmark output to file This changes Ollama to allow the bench command to write benchmark results to a user-specified output file instead of stdout when the --output flag is provided. --------- Co-authored-by: Patrick Devine <[email protected]>

This change adds the ability for `ollama create` to convert models that use the DeepSeek2 architecture (specifically DeepSeekV3 and DeepSeek-R1).

We currently use cache padding of 32 when not using flash attention and 256 with flash attention, which is based on the historic alignment requirements of these kernels. The restrictions have since been loosened but there are still performance benefits, such as better CUDA graph reuse. Since the requirement is no longer kernel-specific, set the padding uniformly to 256, as llama.cpp has.

Although the vision component of multimodal models typically already call the optimized nn.Attention, it is converted into non-fused operations. That is because the backend-specific fused kernels may have requirements, such as padding, and they is performed by the cache, which vision encoders don't use. This implements a fallback path in the backend, softening the requirements into optimizations. In turn, this allows flash attention to be used for vision encoders, saving a significant amount of VRAM and improving performance.

… when using cloud models (ollama#13279) --------- Co-authored-by: Pogosyan Sos <[email protected]> Co-authored-by: Patrick Devine <[email protected]>

Copilot

Pull request overview

This PR implements a cleanup and refactoring effort focused on multi-resolution RoPE (MRoPE) functionality, along with significant code organization improvements. The changes primarily involve removing deprecated logging infrastructure, adding new model implementations, and reorganizing code structure.

Key Changes:

Removed deprecated verbosity threshold logic from CLIP logging system
Added support for multiple new model architectures (Qwen3VL, CogVLM, Janus Pro, LightOnOCR)
Refactored recurrent memory context storage from separate vectors to paired vector structure
Added new model implementation files and vocabulary preprocessing types

Reviewed changes

Copilot reviewed 159 out of 340 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
llama/llama.cpp/tools/mtmd/clip-impl.h	Added new projector types, tensor name definitions, and cleaned up logging macros
llama/llama.cpp/src/unicode.cpp	Added AFMOE digit handling with custom split logic
llama/llama.cpp/src/llama-memory-recurrent.h	Refactored memory storage from separate vectors to paired structure
llama/llama.cpp/src/llama-vocab.h	Added new vocabulary preprocessing types
llama/llama.cpp/src/llama.go	Updated imports to include models package
llama/llama.cpp/src/models/*.cpp	Added 40+ new model implementation files
llama/llama.cpp/src/models/models.go	Added Go package for C++ model bindings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix critical M-RoPE bug: pos[0] must be constant (temporalBase) for all image tokens - Add model path validation before array access (comment ollama#2) - Extract patchEmbedShape() helper to avoid code duplication (comment ollama#3) - Refactor tensor aliasing to declarative struct-based approach (comment ollama#6) - Optimize repetition detection with periodic long pattern checks (comment ollama#9) - Clarify fallback logic documentation for vision projection (comment ollama#8) - Fix misleading pflag comment (comment ollama#4) - Consolidate duplicate Contiguous comments (comment ollama#7) - Add safety comment for pointer comparison in cache (comment #1)

dhiltgen and others added 30 commits November 20, 2025 07:36

Revert "vulkan: temporary cary of vulkan fixes (ollama#12971)"

a9c6818

This reverts commit 3a9e8e9.

ggml update to b7087

d2917b7

fix argsort on metal

366ed3e

update to b7108

9a4271c

fix bakllava regression

af56743

This model lacks the metadata for the projector type.

fix lint logic to only compare against merge base and ignore files th…

4fd4574

…at aren't touched in this PR.

docs: fix output formatting in faq.mdx (ollama#13231)

0c24896

There were a few Markdown typos in one FAQ answer. It now renders as a proper ascii table.

.gitattributes: add app/webview to linguist-vendored (ollama#13274)

6d9f932

update to b7209 - performance regressions...

9b52964

model: ministral w/ llama4 scaling (ollama#13292)

d3e0a0d

This change: * fixes rope scaling in the mistral converter * updates ministral to include llama4 scaling * includes a new ministral parser for parsing reasoning and tool calling --------- Co-authored-by: jmorganca <[email protected]>

CUDA: verify CC is supported by target library (ollama#13298)

f8f1071

test: add ministral-3 (ollama#13300)

d771043

llm: Don't always evict models on CPU-only systems

5317202

Model eviction happens when we have at least one other model loaded and are unable to load all layers into VRAM. However, on CPU-only systems we can never load layers into VRAM, so this constantly triggered eviction. Fixes ollama#13227

test: avoid ministral tools test on low vram (ollama#13302)

18b5958

Avoid hitting test timeouts

Add Vulkan GPU support instructions in development.md (ollama#13265)

20aee96

Added Vulkan SDK installation instructions and environment variable setup for building with Vulkan support.

Update user message format for temperature query (ollama#13256)

cc9555a

ci: restore previous linter rules (ollama#13322)

854d40e

convert: add deepseek converter (ollama#12980)

0a844f8

This change adds the ability for `ollama create` to convert models that use the DeepSeek2 architecture (specifically DeepSeekV3 and DeepSeek-R1).

llm: Enable flash attention for mistral3 by default

9191dfa

fix(api): correct Content-Type header for /api/chat and /api/generate…

31b8c6a

… when using cloud models (ollama#13279) --------- Co-authored-by: Pogosyan Sos <[email protected]> Co-authored-by: Patrick Devine <[email protected]>

Merge branch 'ollama:main' into feat/mrope-clean

98eec35

Copilot AI review requested due to automatic review settings December 5, 2025 09:37

Copilot AI reviewed Dec 5, 2025

View reviewed changes

iosub merged commit ed97f73 into main Dec 5, 2025
1 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/mrope clean #1

Feat/mrope clean #1

Uh oh!

iosub commented Dec 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Feat/mrope clean #1

Feat/mrope clean #1

Uh oh!

Conversation

iosub commented Dec 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants