Releases · ngxson/llama.cpp

02 Aug 18:03

5c0eb5e

b6075 Latest

Latest

opencl: fix adreno compiler detection logic (#15029)

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-08-02T18:03:39Z
llama-b6075-bin-macos-arm64.zip

sha256:b684a08ea300eeb3ce02a6c769c604528d071e21a4371569f6f5fcd8bdccaf1d

10.7 MB 2025-08-02T18:03:51Z
llama-b6075-bin-macos-x64.zip

sha256:6d88a31520f5fb03585922852eab7ca87010c3d8803b4ba5c050cfc994cc4c23

27.4 MB 2025-08-02T18:03:52Z
llama-b6075-bin-ubuntu-vulkan-x64.zip

sha256:9b5d446291764708d662d0229eff7d17bb94b5d6ea34853edead4dc9abd990fd

21 MB 2025-08-02T18:03:54Z
llama-b6075-bin-ubuntu-x64.zip

sha256:d7ac588adf144189a8c25e77f1860080ea76506ad765e47df863236d72c1c0a3

12.6 MB 2025-08-02T18:03:56Z
llama-b6075-bin-win-cpu-arm64.zip

sha256:82d648c3a4827826a171979b63c4f300bcdfc3fab3e5f714d0a21c978a954a07

10.9 MB 2025-08-02T18:03:57Z
llama-b6075-bin-win-cpu-x64.zip

sha256:0477b4423a0ca24992f3d73bc376fca047728cb59582e6c09f40a94122be8a65

13.8 MB 2025-08-02T18:03:58Z
llama-b6075-bin-win-cuda-12.4-x64.zip

sha256:f1796b52bc4226681ef86b8b4185f6632bd19f271f885819738366cd9f87590d

130 MB 2025-08-02T18:03:59Z
llama-b6075-bin-win-hip-radeon-x64.zip

sha256:8d3b630ba67332018e85e2e33c5eb8577172ef859ec22eb83704b39f3fba721d

284 MB 2025-08-02T18:04:05Z
llama-b6075-bin-win-opencl-adreno-arm64.zip

sha256:b987f5dae9e38deea556617d7ac1400ede12ddd01c8f36ae4d6499fc3263f894

11.3 MB 2025-08-02T18:04:15Z
Source code (zip)

2025-08-02T17:51:18Z
Source code (tar.gz)

2025-08-02T17:51:18Z

02 Aug 15:02

github-actions

b6074

03d4698

b6074

CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (#15035)

Assets 15

02 Aug 14:52

github-actions

b6073

3303c19

b6073

cuda: make im2col a little faster (#15025)

Assets 15

02 Aug 14:42

github-actions

b6071

a4569c4

b6071

llama : enable LLAMA_SET_ROWS=1 by default (#14959)

ggml-ci

Assets 15

02 Aug 10:22

github-actions

b6067

f738989

b6067

chat : fix multiple tool_calls on hermes-2-pro (#14962)

Assets 15

02 Aug 09:44

github-actions

b6066

4cb208c

b6066

vulkan: coopmat2 mul_mat optimizations (#14934)

- Increase tile size for k-quants, to match non-k-quants
- Choose more carefully between large and medium tiles, considering how it
  interacts with split_k
- Allow larger/non-power of two split_k, and make the splits a multiple of 256
- Use split_k==3 to when >1/2 and <=2/3 of the SMs would hae been used

Assets 15

02 Aug 09:18

github-actions

b6064

ec0b188

b6064

vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (#15015)

Assets 15

02 Aug 09:07

github-actions

b6063

339bd02

b6063

model : support Qwen3-Embedding (#15023)

Assets 15

02 Aug 08:32

github-actions

b6062

f906275

b6062

server: enable token array inputs for OAI API (#15001)

Assets 15

02 Aug 08:14

github-actions

b6061

a9f7541

b6061

vulkan: optimizations for direct convolution (#14933)

* vulkan: optimizations for direct convolution

- Empirically choose a better tile size. Reducing BS_K/BS_NPQ helps fill
  the GPU. The new size should be amenable to using coopmat, too.
- Fix shmem bank conflicts. 16B padding should work with coopmat.
- Some explicit loop unrolling.
- Skip math/stores work for parts of the tile that are OOB.
- Apply fastdiv opt.
- Disable shuffles for NV.

* Three tiles sizes for CONV_2D, and a heuristic to choose

* reallow collectives for pre-Turing

* make SHMEM_PAD a spec constant

* fixes for intel perf - no shmem padding, placeholder shader core count

* shader variants with/without unrolling

* 0cc4m's fixes for AMD perf

Co-authored-by: 0cc4m <[email protected]>

---------

Co-authored-by: 0cc4m <[email protected]>

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ngxson/llama.cpp

b6075

Uh oh!

b6074

Uh oh!

b6073

Uh oh!

b6071

Uh oh!

b6067

Uh oh!

b6066

Uh oh!

b6064

Uh oh!

b6063

Uh oh!

b6062

Uh oh!

b6061

Uh oh!