Releases: Vithulep/llama.cpp
Releases · Vithulep/llama.cpp
b6098
CANN: add support for ACL Graph (#15065)
* feat(cann): add optional support for ACL Graph execution
This commit adds support for executing ggml computational graphs using
Huawei's ACL graph mode via the USE_CANN_GRAPH flag. The support can be
enabled at compile time using the CMake option:
-DUSE_CANN_GRAPH=ON
By default, ACL graph execution is **disabled**, and the fallback path
uses node-by-node execution.
Key additions:
- CMake option to toggle graph mode
- Graph capture and execution logic using
- Tensor property matching to determine whether graph update is required
- Safe fallback and logging if the environment variable LLAMA_SET_ROWS
is unset or invalid
This prepares the backend for performance improvements in repetitive graph
execution scenarios on Ascend devices.
Signed-off-by: noemotiovon <[email protected]>
* Fix review comments
Signed-off-by: noemotiovon <[email protected]>
* remane USE_CANN_GRAPH to USE_ACL_GRAPH
Signed-off-by: noemotiovon <[email protected]>
* fix typo
Signed-off-by: noemotiovon <[email protected]>
---------
Signed-off-by: noemotiovon <[email protected]>
b5631
vulkan: Track descriptor pools/sets per-context (#14109) Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8) and move it to the vk_device. Move all the descriptor pool and set tracking to the context - none of it is specific to pipelines anymore. It has a single vector of pools and vector of sets, and a single counter to track requests and a single counter to track use.
b5618
metal : use less stack memory in FA kernel (#14088) * metal : use less stack memory in FA kernel ggml-ci * cont : fix BF16 variant