[BugFix] support moe for sm103#7238
Conversation
|
Thanks for your contribution! |
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-08 13:56 CST
📋 Review 摘要
PR 概述:支持 SM103 架构的 MoE GEMM 编译
变更范围:custom_ops/gpu_ops/cutlass_kernels/moe_gemm/
影响面 Tag:[OP]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | fused_moe_gemm_kernels_template.h:712 |
编译时与运行时架构检查范围不一致 |
总体评价
PR 目标是支持 SM103 架构编译,但存在编译时检查(__CUDA_ARCH__ < 1100)和运行时检查(sm_ < 104)范围不一致的问题。建议统一两处的架构范围,确保编译和运行时行为一致。
| } else if (sm_ >= 75 && sm_ < 80) { | ||
| dispatch_moe_gemm_to_cutlass_macro(cutlass::arch::Sm75); | ||
| } else if (sm_ >= 80 && sm_ < 101) { | ||
| } else if (sm_ >= 80 && sm_ < 104) { |
There was a problem hiding this comment.
🔴 Bug 架构范围不一致:编译时检查 (__CUDA_ARCH__ < 1100) 支持到 SM109,但运行时检查 (sm_ < 104) 只支持到 SM103。
这会导致在 SM104-SM109 架构上编译的代码运行时抛出错误。
原因分析:
__CUDA_ARCH__宏格式为major*100 + minor,1100 对应 SM110sm_通过getSMVersion()获取,格式为major*10 + minor,104 对应 SM104
建议修复方式:
统一架构范围,有两种选择:
-
如果目标是支持到 SM103:
- 编译时改为
< 1040 - 运行时保持
< 104
- 编译时改为
-
如果目标是支持到 SM109:
- 编译时保持
< 1100 - 运行时改为
< 110
- 编译时保持
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7238 +/- ##
==========================================
Coverage ? 73.91%
==========================================
Files ? 383
Lines ? 53510
Branches ? 8377
==========================================
Hits ? 39550
Misses ? 11195
Partials ? 2765
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
支持moe sm103编译
Modifications
支持sm103编译
Usage or Command
no
Accuracy Tests
no
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.