Releases · igardev/llama.cpp

11 Aug 06:01

79c1160

b6123 Latest

Latest

cuda: refactored ssm_scan and use CUB (#13291)

* cuda: refactored ssm_scan to use CUB

* fixed compilation error when when not using CUB

* assign L to constant and use size_t instead of int

* deduplicated functions

* change min blocks per mp to 1

* Use cub load and store warp transpose

* suppress clang warning

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-08-11T06:01:23Z
llama-b6123-bin-macos-arm64.zip

sha256:8096652c1515906d12313ecde01bb6405d19f9a9dc51c8e16eed3690bb1f0094

10.8 MB 2025-08-11T06:01:31Z
llama-b6123-bin-macos-x64.zip

sha256:57a489a905afc6e17f81f0307ce99771ab022df3e39aec5f0bcf2f37e603fd31

27.6 MB 2025-08-11T06:01:32Z
llama-b6123-bin-ubuntu-vulkan-x64.zip

sha256:8d56dff245defd87f6483cd3d1320c1a7eb402c9eab7b43ecdec4e020a26cf0c

21.5 MB 2025-08-11T06:01:33Z
llama-b6123-bin-ubuntu-x64.zip

sha256:b249a09b74a496b69cabe46286c684ae1dc4e0b031f2720aeea1cf737b1b7198

12.7 MB 2025-08-11T06:01:34Z
llama-b6123-bin-win-cpu-arm64.zip

sha256:889cff5ac7578a2799c5ba52eaa5d1b78b8d6ab6528ea90273af99231e743091

11 MB 2025-08-11T06:01:35Z
llama-b6123-bin-win-cpu-x64.zip

sha256:15883ed2896cb9649de5cc73361ea8da9db88c75178aa6a574fa69ddac1a2ada

13.9 MB 2025-08-11T06:01:36Z
llama-b6123-bin-win-cuda-12.4-x64.zip

sha256:8a18447131b6ed59e1ad24e1348130493a9fd388663d927f3a5e99a8c157ceff

139 MB 2025-08-11T06:01:37Z
llama-b6123-bin-win-hip-radeon-x64.zip

sha256:9061f486cccb1dde2c064aa064d0afa231f6d1db46832d97884035ea746e5a00

287 MB 2025-08-11T06:01:41Z
llama-b6123-bin-win-opencl-adreno-arm64.zip

sha256:555865456a780eca9287e205c2d7a81359574f3fae1de99e2e692610392da9a5

11.4 MB 2025-08-11T06:01:48Z
Source code (zip)

2025-08-09T18:29:43Z
Source code (tar.gz)

2025-08-09T18:29:43Z

23 May 22:33

github-actions

b5470

b775345

b5470

ci : enable winget package updates (#13734)

Assets 18

22 May 09:32

github-actions

b5453

6b56a64

b5453

SYCL: Avoid using with SYCL-Graph for unsupported nodes (#13587)

Currently on a CUDA backend to SYCL when running
`GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` there
are two operations that throw an exception from the blocking
waits during queue recording.

* `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187
* `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074

We've noticed that `ggml-cuda.cu` has the
[check_node_graph_compatibility_and_refresh_copy_ops](https://github.com/ggml-org/llama.cpp/blob/39e73ae0d69f882d7e29cecc6dd8f5052fca6731/ggml/src/ggml-cuda/ggml-cuda.cu#L2458-L2458)
method for checking if a graph can be used, even if enabled. I've taken a
similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking
if a graph can be used for the operations even if a user has asked for it to be
enabled.

Assets 20

05 May 04:30

github-actions

b5280

27aa259

b5280

mtmd : add C public API (#13184)

* init

* wip

* working version

* add mtmd::bitmaps

* add test target

* rm redundant define

* test: mtmd_input_chunks_free

* rm outdated comment

* fix merging issue

* explicitly create mtmd::input_chunks

* mtmd_input_chunk_copy

* add clone()

* add const to various places

* add warning about breaking changes

* helper: use mtmd_image_tokens_get_n_pos

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Uh oh!

Releases: igardev/llama.cpp

b6123

Uh oh!

b5470

Uh oh!

b5453

Uh oh!

b5280

Uh oh!