Skip to content

Conversation

noemotiovon
Copy link
Collaborator

What does this PR do?

implement LRU cache for ACL graphs in CANN backend.

  • Introduce ggml_cann_graph_lru_cache to store multiple ggml_cann_graph objects.
  • Graphs are loaded on demand and evicted using LRU policy when capacity is exceeded.
  • Updated push, move_to_front, and clear methods to manage cached graphs efficiently.
  • Ensures reuse of graphs, reducing graph reconstruction overhead in CANN backend.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Sep 5, 2025
- Introduce ggml_cann_graph_lru_cache to store multiple ggml_cann_graph objects.
- Graphs are loaded on demand and evicted using LRU policy when capacity is exceeded.
- Updated push, move_to_front, and clear methods to manage cached graphs efficiently.
- Ensures reuse of graphs, reducing graph reconstruction overhead in CANN backend.
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ascend NPU issues specific to Ascend NPUs documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant