-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to multiple process groups by syncing across ranks #151
Conversation
@shengfukevin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments.
This pull request was exported from Phabricator. Differential Revision: D60788539 |
Summary: Add support to multiple process groups by syncing across ranks. Pull Request resolved: #151 Test Plan: /usr/local/fbcode/platform010/bin/mpirun -np 2 path-to/comm_replay.par --trace-path param_bench/fb/integration_tests/resnet-2gpu --trace-type et Differential Revision: D60788539 Pulled By: shengfukevin
8e3bdb7
to
727f5fd
Compare
Summary: Add support to multiple process groups by syncing across ranks. Pull Request resolved: #151 Test Plan: /usr/local/fbcode/platform010/bin/mpirun -np 2 path-to/comm_replay.par --trace-path param_bench/fb/integration_tests/resnet-2gpu --trace-type et Differential Revision: D60788539 Pulled By: shengfukevin
This pull request was exported from Phabricator. Differential Revision: D60788539 |
727f5fd
to
573f91a
Compare
@GSSBMW, I converted the dictionary (loaded from json) between "str" to "str" to a dictionary between "int" to "str", keep the sort code. Please review it. Thanks |
Summary: Add support to multiple process groups by syncing across ranks. Pull Request resolved: #151 Test Plan: /usr/local/fbcode/platform010/bin/mpirun -np 2 path-to/comm_replay.par --trace-path param_bench/fb/integration_tests/resnet-2gpu --trace-type et Reviewed By: briancoutinho Differential Revision: D60788539 Pulled By: shengfukevin
This pull request was exported from Phabricator. Differential Revision: D60788539 |
573f91a
to
457854c
Compare
@shengfukevin merged this pull request in c466b60. |
Summary
Add support to multiple process groups by syncing across ranks.
Test Plan
$ comm_replay --trace-type et --trace-path /home/sanshang/021_debug/000_code/param/trace/traces_megatronlm_gpt_43B_32ranks_pytnightly0703/execution_trace