-
Notifications
You must be signed in to change notification settings - Fork 4.2k
FSDP2 tutorial #3358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
FSDP2 tutorial #3358
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3358
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New FailureAs of commit 87b76bd with merge base 85739f5 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
FSDP2 tutorial is ready for review @AlannaBurke @svekars |
link failure is expected for "https://docs.pytorch.org/tutorials/intermediate/FSDP1_tutorial.html". It will work when we land FSDP1_tutorial in this PR |
|
||
# initialize the process group | ||
dist.init_process_group("nccl", rank=rank, world_size=world_size) | ||
``fully_shard`` register forward/backward hooks to all-gather parameters before computation, and reshard parameters after computation. To overlap all-gathers with computation, FSDP2 offers **implicit prefetching** that works out of the box with the training loop above and **explicit prefetching** for advanced users to control all-gather schedules manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
register -> registers
reshard -> reshards
**Author**: `Hamid Shojanazeri <https://github.com/HamidShojanazeri>`__, `Yanli Zhao <https://github.com/zhaojuanmao>`__, `Shen Li <https://mrshenli.github.io/>`__ | ||
|
||
.. note:: | ||
|edit| FSDP1 is deprecated. Please check out `FSDP2 tutorial <https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|edit| FSDP1 is deprecated. Please check out `FSDP2 tutorial <https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`_. | |
FSDP1 is deprecated. Please check out `FSDP2 tutorial <https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`_. |
FSDP2 tutorial replaces FSDP1 tutorial in place (intermediate_source/FSDP_tutorial.rst)
FSDP1 tutorial is renamed to intermediate_source/FSDP1_tutorial.rst. FSDP2 tutorial link to it
the code for this tutorial is commited to pytorch examples, https://github.com/pytorch/examples/tree/main/distributed/FSDP2