-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Enabled high-performance Automatic Tensor Parallelism (auto TP) for the MoE models on multiple GPUs/HPUs #6964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gyou2021
wants to merge
64
commits into
deepspeedai:master
Choose a base branch
from
gyou2021:autoTP_Qwen2Moe_DeepSeekv2
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 62 commits
Commits
Show all changes
64 commits
Select commit
Hold shift + click to select a range
c9b12af
Reduced the experts allreduce number per layer to ONCE for the Qwen2-…
gyou2021 590ea36
Fixed format
gyou2021 889c275
Removed print
gyou2021 2ec6c34
Fix a bug about set.
gyou2021 504d696
Add the missing view operations from sequence parallel(async). (#6750)
inkcherry c266dc9
Update `torch.norm` to `torch.linalg.norm` and `torch.linalg.vector_n…
loadams ae12993
Using explicit GPU upcast for ZeRO-Offload (#6962)
xylian86 deb09a3
Update version.txt after 0.16.3 release (#6965)
loadams 128d436
Precisely track nvme optimizer offload (#6963)
tjruwase 864472b
Update build_win.bat script to exclue GDS op as it lacks Windows supp…
loadams 1ac398c
Add CUDA 12.8 support and comment on CUDA 12.7 (#6975)
loadams eda53d8
Update torch versions to support 2.6 (#6977)
loadams 112a7c6
generalize deepspeed linear and implement it for non cuda systems (#6…
oelayan7 7d2c5fe
Update recommended Windows whl building versions (#6983)
loadams f1d326c
Title: Fix setup_env_ranks to Properly Set Environment Variables Inst…
fabiosanger 46545d7
Specify torchvision in nv-ds-chat workflow (prevents errors with torc…
loadams af1ba94
Remove assumption that padding only occurs on last rank (#6974)
xylian86 e235921
Use ds-specific module id to avoid conflicts (#6847)
tjruwase f5e9796
Update A6000 workflows to use newer docker container - 24.09 vs 24.03…
loadams 07634b9
Allow NVIDIA Blackwell (#6991)
fabiendupont 0e57fa0
Update GH org references (#6998)
tjruwase e86c0c3
Update CNAME
loadams 0d7f0eb
Update CNAME
loadams cd8a988
[XPU] max1100 workflow update for docker and softwares (#7003)
Liangliang-Ma 18c712f
autotp training(fix dco) (#7004)
inkcherry c5bf6f6
import triton files when triton is supported and installed (#6989)
oelayan7 590de5f
Update A6000 tests transformers version (#7016)
loadams 693c39f
Fix ds-chat CI regression (#7015)
tjruwase 322a05a
[Ulysses tutorial] typos (#7024)
stas00 8869d78
fix hostname -I for macOS #6497 (#6990)
fitzjalen e4d03af
Update workflows to cuda 12.4 (#7000)
loadams 8c6251d
[ROCm] Enable fp_quantizer on ROCm (#7027)
rraminen e3e179c
add gds chinese blog (#7034)
GuanhuaWang fd2787b
Add chinese blog for deepspeed windows, and fix format (#7035)
hwchen2017 ba8ef57
AIO on ROCM (#7023)
jomayeri f4b0f58
Control trace cache warnings (#7039)
tjruwase 3ca3e2f
Update CUDA compute capability to support Blackwell (#7047)
hwchen2017 5612778
Update setup.py handling of ROCm cupy (#7051)
loadams af8c190
nv-ds-chat breaks with latest transformers (#7052)
loadams 225471a
Rename aio_thread_count to intra_op_parallelism (#7056)
tjruwase 1df293a
add autoTP training zero2 tests (#7049)
inkcherry 94abf68
Fix, bf16 optimizer remove dup loop (#7054)
wukong1992 4a4ff9b
Update version.txt after 0.16.4 release (#7063)
loadams e5eda47
fix an outdated doc wrt CUDA_VISIBLE_DEVICES (#7058)
stas00 675ec9a
Tecorigin sdaa accelerator (#6903)
siqi654321 81c1fee
Handle special case of libuv for Windows (#7064)
loadams 17f544c
Update README with info on newest accelerator (#7065)
loadams 20fd872
Bug Fix for offload_states API (#7050)
U-rara 0b289a2
Fix TOCTOU issues, switch to fstat (#7067)
loadams 4a86d02
config torch to avoid graph breaks caused by logger (#6999)
ShellyNR 594b5bb
Fix meta load tensor imcompatible issue (#7073)
Yejing-Lai a843e39
Replace calls to `python setup.py sdist` with `python -m build --sdis…
loadams 4cbc52c
Revert "Handle special case of libuv for Windows (#7064)" (#7076)
loadams 586e436
Add DeepseekV3 AutoTP. (#7045)
Yejing-Lai 5e379ad
Improve inference tutorial docs (#7083)
loadams 13bf866
Added support for the environment variable DS_MOE_EXPERTS_REDUCE_ONCE…
gyou2021 d5115be
Changed env variable name to 'DS_MOE_TP_SINGLE_ALLREDUCE'
gyou2021 f0044cb
Pin transformers version on tests that use latest. (#7085)
loadams 16ad5fd
Update README.md with ICS '23 MoE paper link (#7087)
siddharth9820 47d4420
Update parallelism for nv-torch-latest/nightly tests due to more GPUs…
loadams b3c64dd
Remove workflows for very old torch versions (#7090)
loadams 9b1fe98
Fixed conflicts
gyou2021 6b96dd9
Update auto_tp.py
gyou2021 e7883e7
Merge branch 'master' into autoTP_Qwen2Moe_DeepSeekv2
hwchen2017 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.