-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[CUDAGraph] Remove CUDAGraph replay after capture and use the same device context in CUDA Graph #75954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDAGraph] Remove CUDAGraph replay after capture and use the same device context in CUDA Graph #75954
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
标题说明做了什么,即移除 CUDAGraph capture 后的 replay 以及在 CUDA Graph 中使用相同的 device context,描述再写支持了 FD 基于子图拆分的 CUDA Graph 模式
|
||
// for cudagraph op | ||
if (op->GetParentOp()->isa<paddle::dialect::CudaGraphOp>()) { | ||
VLOG(4) << "CudaGraphOp detected, using original device context"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
说明是 CUDAGraphOp 内的 OP,并且说明要确保是同一个 devcie context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #75954 +/- ##
===========================================
Coverage ? 100.00%
===========================================
Files ? 1
Lines ? 3
Branches ? 0
===========================================
Hits ? 3
Misses ? 0
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Category
Operator Mechanism
PR Types
Improvements
Description
支持
FastDeploy
SOT + CUDAGraph + 开启子图切分推理:PaddlePaddle/FastDeploy#4386