-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[CPU] Fix SDPA node attention mask precision handling for bf16/f16 inference #33132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[CPU] Fix SDPA node attention mask precision handling for bf16/f16 inference #33132
Conversation
Use actual attention mask input precision instead of compute precision (bf16/f16) to fix LFM2-350M output corruption when running with low precision on Xeon platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes precision handling for attention masks in the Scaled Dot Product Attention (SDPA) implementation when running with BF16/F16 inference precision on CPU. The issue caused output corruption in LFM2-350M model on Xeon platforms.
Key Changes:
- Uses actual attention mask input precision instead of assuming compute precision (bf16/f16)
- Fixes pointer arithmetic to use byte-based strides for attention masks
- Adds comprehensive test coverage for stateful SDPA with boolean masks in BF16 precision
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| stateful_sdpa_bool_mask.cpp | Adds new test case validating SDPA with boolean masks in BF16 inference mode |
| scaled_attn.cpp | Fixes attention mask precision detection and pointer arithmetic for both ONEDNN and ACL kernel implementations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
zhangYiIntel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| @@ -0,0 +1,152 @@ | |||
| #include <gtest/gtest.h> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we reuse existing tests to cover the boolean attn_mask ? The only change here is the attn_mask type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add copy right in the head of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we reuse existing tests to cover the boolean attn_mask ? The only change here is the attn_mask type.
previously I try to extend current existing SDPA tests to cover our changed logic, but I found these SDPA tests are stateless and will be converted to Snippets(Subgraph) during inference which will not use our changed logic, so I created new ones to cover our changes.
Details:
Tickets: