ggml-cpu: drop support for nnpa intrinsics #15821

taronaeo · 2025-09-05T18:50:43Z

Closes #15721
Supersedes #15739

This Pull Request drops support for the NNPA Vector Intrinsics as the maintenance cost required does not justify the performance improvements for FP32 ↔ FP16 conversion.

Tested with both -fa off and -fa on and ensured that the inference is correct on both modes.

For future reference to IBMers that want to bring this acceleration back,

The NNPA Vector Intrinsics implementation for both FP32 → FP16 and FP16 → FP32 conversion is correct.
Enabling Flash Attention (-fa on, turned on by default) somehow causes tensor data to be invalid i.e., -inf and nan. Make sure to check that the data is clean before determining if the conversion implementation is correct. See: ggml-cpu: fixes instability in NNPA Vector Intrinsics #15739 (comment)
The function calling the FP32 ↔ FP16 conversion and providing invalid data is likely coming from ggml_compute_forward_dup_f32.

Signed-off-by: Aaron Teo <[email protected]>

ggml/src/ggml-cpu/ggml-cpu.c

Signed-off-by: Aaron Teo <[email protected]>

ggml-cpu: drop support for nnpa intrinsics

8d0221f

Signed-off-by: Aaron Teo <[email protected]>

taronaeo mentioned this pull request Sep 5, 2025

ggml-cpu: fixes instability in NNPA Vector Intrinsics #15739

Closed

github-actions bot added documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning labels Sep 5, 2025

slaren reviewed Sep 5, 2025

View reviewed changes

ggml/src/ggml-cpu/ggml-cpu.c Show resolved Hide resolved

ggml-cpu: rm ggml_cpu_has_nnpa from header files

7cec179

Signed-off-by: Aaron Teo <[email protected]>

slaren approved these changes Sep 5, 2025

View reviewed changes

taronaeo merged commit 186415d into ggml-org:master Sep 6, 2025
48 checks passed

taronaeo mentioned this pull request Sep 7, 2025

Misc. bug: llama.cpp always produces malformed on Loongarch output starting from b6353 #15854

Open

walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025

ggml-cpu: drop support for nnpa intrinsics (ggml-org#15821)

f18fede

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-cpu: drop support for nnpa intrinsics #15821

ggml-cpu: drop support for nnpa intrinsics #15821

Uh oh!

taronaeo commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggml-cpu: drop support for nnpa intrinsics #15821

ggml-cpu: drop support for nnpa intrinsics #15821

Uh oh!

Conversation

taronaeo commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!