-
Notifications
You must be signed in to change notification settings - Fork 12.5k
kleidiai: add support for get_rows #14676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
static inline float compute_fp16_to_fp32(ggml_fp16_t h) { | ||
static_assert(sizeof(ggml_fp16_t) == sizeof(__fp16), "ggml_fp16_t and __fp16 must be the same size"); | ||
__fp16 tmp; | ||
memcpy(&tmp, &h, sizeof(ggml_fp16_t)); | ||
return (float)tmp; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use ggml_fp16_to_fp32()
instead introducing this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point — I'll update the patch to use ggml_fp16_to_fp32() instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is in the CPU backend, it could also use the potentially more efficient ggml_cpu_fp16_to_fp32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General advice is to try to keep the implementation more generic - it seems to focus a lot on Q4_0
. Adding more asserts for the current underlying assumptions will help long term in case we add support for other types.
Another important thing we should improve soon is to add support for testing extra buffer types in test-backend-ops
(see ggml-org/whisper.cpp#3223 (comment)). Without such tests it is very difficult to verify that these changes do not break something.
I've updated the patch to address all review comments. However, I noticed that three CI tests are currently failing due to what appear to be unrelated infrastructure issues. |
The build failures are unrelated. |
All the review comments have been addressed—just checking if there’s anything else needed from my side. Thanks! |
* origin/master: (49 commits) ci : correct label refactor->refactoring (ggml-org#14832) CUDA: fix quantized KV cache + multiple sequences (ggml-org#14822) tests : add non-cont K,V FA tests memory : handle saving/loading null layers in recurrent memory (ggml-org#14675) ggml: fix loongarch quantize_row_q8_1 error (ggml-org#14827) CANN: weight format to NZ for Ascend310P3 (ggml-org#14407) CUDA: add fused rms norm (ggml-org#14800) ggml : model card yaml tab->2xspace (ggml-org#14819) vulkan: fix rms_norm_mul to handle broadcasting dim0 (ggml-org#14817) llama : add model type detection for rwkv7 7B&14B (ggml-org#14816) imatrix: add option to display importance score statistics for a given imatrix file (ggml-org#12718) Mtmd: add a way to select device for vision encoder (ggml-org#14236) cuda : implement bf16 cpy ops and enable bf16 cont (ggml-org#14763) opencl: remove unreachable `return` (ggml-org#14806) server : allow setting `--reverse-prompt` arg (ggml-org#14799) cuda: remove linking to cublasLt (ggml-org#14790) opencl: fix `im2col` when `KW!=KH` (ggml-org#14803) opencl: add conv2d kernel (ggml-org#14403) sycl: Fix im2col (ggml-org#14797) kleidiai: add support for get_rows (ggml-org#14676) ...
* kleidiai: add support for get_rows * apply fixes based on code review * apply more fixes based on code review
This patch adds support for KleidiAI acceleration of the Q4_0 matrix multiplication operation in cases where the weight tensor is shared with the get_rows operator. A typical use case is in whisper.cpp, where such weight sharing occurs between get_rows and matmul.