-
Notifications
You must be signed in to change notification settings - Fork 639
[SOT][CUDAGraph] Add support for custom all-reduce operators under SOT mode #4386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -1575,6 +1575,7 @@ def _update_chunked_prefill(self, tasks): | |||||||||||||||||||||||||||||||
self.proposer.update_task_chunk_prefill(task) | ||||||||||||||||||||||||||||||||
task.chunk_idx += 1 | ||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
@sot_warmup_guard(True) | ||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SOT的 Warm Up 延后是为了避免 custom all reduce 的什么问题呢 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 在 FastDeploy/fastdeploy/distributed/custom_all_reduce/custom_all_reduce.py Lines 208 to 222 in 5abf597
|
||||||||||||||||||||||||||||||||
def capture_model(self) -> None: | ||||||||||||||||||||||||||||||||
""" | ||||||||||||||||||||||||||||||||
Trigger CUDA Graph capture for all shapes in cuda graph capture list | ||||||||||||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
静态图也用custom all reduce 对吧
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对,目前静态图 Custom AllReduce 和 Paddle AllReduce 都支持
但是使用Custom AllReduce,需要加参数
--max-num-batched-tokens 500
,后面这个数小于500就行,具体原因后面继续排查~