[XPU][OP] Add build_sampling_params kernel for MTP speculative decoding#8032
[XPU][OP] Add build_sampling_params kernel for MTP speculative decoding#8032Clarity256 wants to merge 3 commits into
Conversation
Add a new XPU custom operator `build_sampling_params` that constructs sampling parameters (top_p, top_k, topp_seed) on device for MTP speculative decoding verification. This replaces the previous Python-level `padding_sampling_params` approach with a more efficient XPU kernel implementation that supports CudaGraph capture. Key components: - XPU kernel implementation (build_sampling_params.xpu) - C++ wrapper and op registration - Plugin header declaration - Unit tests with comprehensive coverage
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #8032 +/- ##
==========================================
Coverage ? 67.84%
==========================================
Files ? 470
Lines ? 66111
Branches ? 10187
==========================================
Hits ? 44855
Misses ? 18390
Partials ? 2866
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
CI报告基于以下代码生成(30分钟更新一次): 1 Required任务 : 8/10 通过
2 失败详情🔴 Approval — 需要 Approval(置信度: 高)该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。
🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 不稳定问题(置信度: 中)错误类型: 不稳定问题 | 置信度: 中
关键日志:
修复建议:
关联变更: |
Align the per-position seed offset stride with the Python padding_sampling_params implementation it replaces: XPU requires a stride of 32 (not 4) so that the generated topp_seed sequence matches the original reference. Update both the kernel and CPU wrapper, and the unit test reference accordingly.
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-06-12 18:35:56
📋 Review 摘要
PR 概述:新增 XPU build_sampling_params custom op,用 device 端 kernel 生成 sampling 参数并更新 infer_seed。
变更范围:custom_ops/xpu_ops 的 op 注册、plugin wrapper/kernel 和 XPU 单测。
影响面 Tag:[XPU] [OP] [Speculative Decoding]
问题
未发现新的阻塞性问题。PR 规范问题在下面章节报,不在这里重复。
历史 Findings 修复情况
| Finding | 问题 | 状态 |
|---|---|---|
| F1 | 新增 op 目前没有接入实际 XPU sampler 路径。 |
📝 PR 规范检查
标题包含 [XPU][OP] 两个 Tag;FastDeploy 规范要求标题必须且仅包含一个官方 Tag。描述结构符合模板。
标题建议(可直接复制):
[XPU] Add build_sampling_params kernel for MTP speculative decoding
总体评价
本轮重点检查了新增 op 注册、XPU plugin 构建发现、kernel/wrapper 与 padding_sampling_params 的 seed 语义,以及 XPU sampler 调用路径。新增文件本身未发现需要阻塞的问题;历史接入问题仍未解决,标题 Tag 仍需按规范收敛。
Motivation
在 XPU MTP 投机解码启用 CUDAGraph 的过程中,原有的
padding_sampling_params(Python 侧 CPU 实现)会产生 host-device 同步,无法被 CUDAGraph 捕获。本 PR 新增build_sampling_paramsXPU 自定义算子,将 sampling 参数(top_p, top_k, topp_seed)的构造和infer_seed的原地更新完全在 device 端完成,为后续 CUDAGraph capture 扫清障碍。Modifications
custom_ops/xpu_ops/src/ops/mtp/build_sampling_params.cc:新增 Paddle 自定义算子入口,注册build_sampling_paramsop。custom_ops/xpu_ops/src/plugin/include/xpu/plugin.h:声明build_sampling_paramsC 接口。custom_ops/xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/build_sampling_params.xpu:新增 XPU3 kernel 实现。custom_ops/xpu_ops/src/plugin/src/wrapper/mtp_wrapper/build_sampling_params.cpp:新增 CPU wrapper 和 XPU3 wrapper。custom_ops/xpu_ops/test/test_build_sampling_params.py:新增单元测试,覆盖纯 decoder、纯 encoder、混合、单条、seed wrap-around 等场景。Usage or Command
cd custom_ops/xpu_ops/test && python test_build_sampling_params.py
Accuracy Tests
单元测试对比 Python reference 实现(原 padding_sampling_params 逻辑),全部 case 通过。
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.