Releases · Vithulep/llama.cpp

06 Aug 07:38

2241453

b6098 Latest

Latest

CANN: add support for ACL Graph (#15065)

* feat(cann): add optional support for ACL Graph execution

This commit adds support for executing ggml computational graphs using
Huawei's ACL graph mode via the USE_CANN_GRAPH flag. The support can be
enabled at compile time using the CMake option:

    -DUSE_CANN_GRAPH=ON

By default, ACL graph execution is **disabled**, and the fallback path
uses node-by-node execution.

Key additions:
- CMake option  to toggle graph mode
- Graph capture and execution logic using
- Tensor property matching to determine whether graph update is required
- Safe fallback and logging if the environment variable LLAMA_SET_ROWS
  is unset or invalid

This prepares the backend for performance improvements in repetitive graph
execution scenarios on Ascend devices.

Signed-off-by: noemotiovon <[email protected]>

* Fix review comments

Signed-off-by: noemotiovon <[email protected]>

* remane USE_CANN_GRAPH to USE_ACL_GRAPH

Signed-off-by: noemotiovon <[email protected]>

* fix typo

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-08-06T07:38:32Z
llama-b6098-bin-macos-arm64.zip

sha256:c331d902cf0acf7fd67f0467c70e5fb1e22f30be81b4141a88362420df3a503c

10.7 MB 2025-08-06T07:38:41Z
llama-b6098-bin-macos-x64.zip

sha256:60bc607f1c12fae301c20ca7350b4cd513a0c9e4a9ee0b08ab0aa13d131101d1

27.4 MB 2025-08-06T07:38:42Z
llama-b6098-bin-ubuntu-vulkan-x64.zip

sha256:d85e844e83c221e94e7acf57674a190838b03449c4259d9e1e7c81533d58d0ee

21.4 MB 2025-08-06T07:38:43Z
llama-b6098-bin-ubuntu-x64.zip

sha256:e10ed83c6f8b75ee36951d83d78c951af2bd6f521d102c32d9e1500ebd9d1aee

12.7 MB 2025-08-06T07:38:44Z
llama-b6098-bin-win-cpu-arm64.zip

sha256:66837944d88bd0410a4a69c8be1dbef996a807679b2655849dc5afa9b70129f6

10.9 MB 2025-08-06T07:38:45Z
llama-b6098-bin-win-cpu-x64.zip

sha256:530caa01a3c90adb6eea8f1d86d7da6e22f318fe5c754710993ea8842f37db03

13.8 MB 2025-08-06T07:38:46Z
llama-b6098-bin-win-cuda-12.4-x64.zip

sha256:25d7a3daa219b58adb444ba4ba410dbb443abef8ba3b5b5f6c84a5eceb2588f5

135 MB 2025-08-06T07:38:47Z
llama-b6098-bin-win-hip-radeon-x64.zip

sha256:67282ec05f70dd5a390fcde1fee08ac33520f75190432aa4756dd7a60d07fd3b

286 MB 2025-08-06T07:38:51Z
llama-b6098-bin-win-opencl-adreno-arm64.zip

sha256:b62385432cbb54dce0363562ce985cb954aae6b6b77dcec577828693036f778f

11.3 MB 2025-08-06T07:38:57Z
Source code (zip)

2025-08-06T06:12:42Z
Source code (tar.gz)

2025-08-06T06:12:42Z

11 Jun 09:39

github-actions

b5631

1f7d50b

b5631

vulkan: Track descriptor pools/sets per-context (#14109)

Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8)
and move it to the vk_device. Move all the descriptor pool and set tracking to
the context - none of it is specific to pipelines anymore. It has a single vector
of pools and vector of sets, and a single counter to track requests and a single
counter to track use.

Assets 15

10 Jun 06:51

github-actions

b5618

1f63e75

b5618

metal : use less stack memory in FA kernel (#14088)

* metal : use less stack memory in FA kernel

ggml-ci

* cont : fix BF16 variant

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Uh oh!

Releases: Vithulep/llama.cpp

b6098

Uh oh!

b5631

Uh oh!

b5618

Uh oh!