Skip to content

Releases: ngxson/llama.cpp

b6075

02 Aug 18:03
5c0eb5e
Compare
Choose a tag to compare
opencl: fix adreno compiler detection logic (#15029)

b6074

02 Aug 15:02
03d4698
Compare
Choose a tag to compare
CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (#15035)

b6073

02 Aug 14:52
3303c19
Compare
Choose a tag to compare
cuda: make im2col a little faster (#15025)

b6071

02 Aug 14:42
a4569c4
Compare
Choose a tag to compare
llama : enable LLAMA_SET_ROWS=1 by default (#14959)

ggml-ci

b6067

02 Aug 10:22
f738989
Compare
Choose a tag to compare
chat : fix multiple tool_calls on hermes-2-pro (#14962)

b6066

02 Aug 09:44
4cb208c
Compare
Choose a tag to compare
vulkan: coopmat2 mul_mat optimizations (#14934)

- Increase tile size for k-quants, to match non-k-quants
- Choose more carefully between large and medium tiles, considering how it
  interacts with split_k
- Allow larger/non-power of two split_k, and make the splits a multiple of 256
- Use split_k==3 to when >1/2 and <=2/3 of the SMs would hae been used

b6064

02 Aug 09:18
ec0b188
Compare
Choose a tag to compare
vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (#15015)

b6063

02 Aug 09:07
339bd02
Compare
Choose a tag to compare
model : support Qwen3-Embedding (#15023)

b6062

02 Aug 08:32
f906275
Compare
Choose a tag to compare
server: enable token array inputs for OAI API (#15001)

b6061

02 Aug 08:14
a9f7541
Compare
Choose a tag to compare
vulkan: optimizations for direct convolution (#14933)

* vulkan: optimizations for direct convolution

- Empirically choose a better tile size. Reducing BS_K/BS_NPQ helps fill
  the GPU. The new size should be amenable to using coopmat, too.
- Fix shmem bank conflicts. 16B padding should work with coopmat.
- Some explicit loop unrolling.
- Skip math/stores work for parts of the tile that are OOB.
- Apply fastdiv opt.
- Disable shuffles for NV.

* Three tiles sizes for CONV_2D, and a heuristic to choose

* reallow collectives for pre-Turing

* make SHMEM_PAD a spec constant

* fixes for intel perf - no shmem padding, placeholder shader core count

* shader variants with/without unrolling

* 0cc4m's fixes for AMD perf

Co-authored-by: 0cc4m <[email protected]>

---------

Co-authored-by: 0cc4m <[email protected]>