[append attention] clean code#7062
Conversation
|
“liuruian” seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
Thanks for your contribution! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7062 +/- ##
==========================================
Coverage ? 73.19%
==========================================
Files ? 401
Lines ? 56573
Branches ? 8942
==========================================
Hits ? 41410
Misses ? 12219
Partials ? 2944
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| smem_t k_smem(smem + num_frags_x * 16 * HEAD_DIM * sizeof(T)), | ||
| v_smem(smem + (num_frags_x + NUM_WARP_KV * num_frags_z) * 16 * HEAD_DIM * | ||
| sizeof(T)); | ||
| static_assert(num_rows_per_block == num_frags_x * 16); |
There was a problem hiding this comment.
num_rows_per_block应该等于NUM_WARP_Q * num_frags_x * 16(tensor core的一个mma m维),这里因为原本NUM_WARP_Q等于1做了省略,assert的话可以加上
There was a problem hiding this comment.
NUM_WARP_Q == 1的assert 在函数开头加上了哈
| tid % 16, tid / 16); // 16 * 16 | ||
|
|
||
| const uint32_t q_end = | ||
| min(q_len, div_up((tile_id + 1) * num_rows_per_block, GROUP_SIZE)); |
There was a problem hiding this comment.
这里游泳一些边界case测试下offset确实不会超过div_up((tile_id + 1) * num_rows_per_block, GROUP_SIZE)吗
There was a problem hiding this comment.
这里游泳一些边界case测试下offset确实不会超过div_up((tile_id + 1) * num_rows_per_block, GROUP_SIZE)吗
这里因为每个CTA 最多只读 num_rows_per_block 个Q head_dim,所以只需要检查不超过q_len即可
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.