Skip to content

Conversation

@SzymonPrajs
Copy link

@SzymonPrajs SzymonPrajs commented Dec 20, 2025

Fix Metal buffer range length (use element count) and release transient Metal buffers after set/get tensor sync to avoid unnecessary writes and lingering resources. Behavior stays the same, just avoids overruns and leaks.

Tests: ./ci/run.sh ./tmp/results ./tmp/mnt; ./build/bin/llama-perplexity --model models/functiongemma-270m-it-Q4_K_M.gguf -f models-mnt/wikitext/wikitext-2-raw/wiki.test.raw -c 512 -b 256 --chunks 1 -ngl 99; ./build/bin/llama-bench --model models/functiongemma-270m-it-Q4_K_M.gguf -ngl 99.

AI: bug identified with Codex.

@SzymonPrajs SzymonPrajs marked this pull request as draft December 20, 2025 11:50
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Dec 20, 2025
@SzymonPrajs SzymonPrajs marked this pull request as ready for review December 20, 2025 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant