[CPU] Fix SDPA node attention mask precision handling for bf16/f16 inference #33132

liubo-intel · 2025-12-05T07:22:52Z

Details:

Use actual attention mask input precision instead of compute precision (bf16/f16) to fix LFM2-350M output corruption when running with low precision on Xeon platforms.

Tickets:

CVS-177340

Use actual attention mask input precision instead of compute precision (bf16/f16) to fix LFM2-350M output corruption when running with low precision on Xeon platforms.

Copilot

Pull request overview

This PR fixes precision handling for attention masks in the Scaled Dot Product Attention (SDPA) implementation when running with BF16/F16 inference precision on CPU. The issue caused output corruption in LFM2-350M model on Xeon platforms.

Key Changes:

Uses actual attention mask input precision instead of assuming compute precision (bf16/f16)
Fixes pointer arithmetic to use byte-based strides for attention masks
Adds comprehensive test coverage for stateful SDPA with boolean masks in BF16 precision

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
stateful_sdpa_bool_mask.cpp	Adds new test case validating SDPA with boolean masks in BF16 inference mode
scaled_attn.cpp	Fixes attention mask precision detection and pointer arithmetic for both ONEDNN and ACL kernel implementations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/plugins/intel_cpu/src/nodes/scaled_attn.cpp

zhangYiIntel

LGTM

zhangYiIntel · 2025-12-08T01:44:12Z

...plugins/intel_cpu/tests/functional/custom/subgraph_tests/src/x64/stateful_sdpa_bool_mask.cpp

@@ -0,0 +1,152 @@
+#include <gtest/gtest.h>


Could we reuse existing tests to cover the boolean attn_mask ? The only change here is the attn_mask type.

Please also add copy right in the head of the file.

Could we reuse existing tests to cover the boolean attn_mask ? The only change here is the attn_mask type.

previously I try to extend current existing SDPA tests to cover our changed logic, but I found these SDPA tests are stateless and will be converted to Snippets(Subgraph) during inference which will not use our changed logic, so I created new ones to cover our changes.

src/plugins/intel_cpu/src/nodes/scaled_attn.cpp

[CPU] Fix attention mask precision handling in ScaledDotProductAttention

ec830a2

Use actual attention mask input precision instead of compute precision (bf16/f16) to fix LFM2-350M output corruption when running with low precision on Xeon platforms.

liubo-intel requested review from a team as code owners December 5, 2025 07:22

github-actions bot added the category: CPU OpenVINO CPU plugin label Dec 5, 2025

liubo-intel requested a review from Copilot December 5, 2025 07:31

Copilot AI reviewed Dec 5, 2025

View reviewed changes

src/plugins/intel_cpu/src/nodes/scaled_attn.cpp Outdated Show resolved Hide resolved

src/plugins/intel_cpu/src/nodes/scaled_attn.cpp Outdated Show resolved Hide resolved

src/plugins/intel_cpu/src/nodes/scaled_attn.cpp Outdated Show resolved Hide resolved

Apply review comments

eb9ea32

yuxu42 requested a review from zhangYiIntel December 8, 2025 01:56

zhangYiIntel approved these changes Dec 8, 2025

View reviewed changes

Apply suggestions from code review

c441265

liubo-intel removed request for a team December 9, 2025 05:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CPU] Fix SDPA node attention mask precision handling for bf16/f16 inference #33132

[CPU] Fix SDPA node attention mask precision handling for bf16/f16 inference #33132

liubo-intel commented Dec 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhangYiIntel left a comment

Uh oh!

zhangYiIntel Dec 8, 2025

Uh oh!

zhangYiIntel Dec 8, 2025

Uh oh!

liubo-intel Dec 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CPU] Fix SDPA node attention mask precision handling for bf16/f16 inference #33132

Are you sure you want to change the base?

[CPU] Fix SDPA node attention mask precision handling for bf16/f16 inference #33132

Conversation

liubo-intel commented Dec 5, 2025

Details:

Tickets:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhangYiIntel left a comment

Choose a reason for hiding this comment

Uh oh!

zhangYiIntel Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

zhangYiIntel Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

liubo-intel Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liubo-intel Dec 8, 2025 •

edited

Loading