-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Pull requests: NVIDIA/Megatron-LM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Enabling variable_seq_lengths when encoder has Different TP Size
#1470
opened Mar 12, 2025 by
xiaojunjie
Loading…
fix(moe): the missing argument 'router_dtype' of _DeepepManager.__init__
#1463
opened Mar 11, 2025 by
AsakusaRinne
Loading…
[Bug Fix] fix p2p communication order error and stuck problems when pp 2 and vpp 2 with remove pad
#1451
opened Mar 5, 2025 by
ETOgaosion
Loading…
Replace deprecated numpy.product with numpy.prod to ensure compatibility with NumPy >=2.0
#1440
opened Feb 27, 2025 by
mustious
Loading…
fix a bug in load balancing loss aggregation when recompute is turned on
#1433
opened Feb 26, 2025 by
lyuwen
Loading…
fix: return float instead of tensor from
get_rotary_seq_len
#1419
opened Feb 20, 2025 by
jasonchiu-codeium
Loading…
Fix document regarding GQA (
--group-query-attention
) argument
#1401
opened Feb 12, 2025 by
eagle705
Loading…
Fix issue in converting Mixtral 8x7B checkpoints from HF to MCore and update doc
#1397
opened Feb 11, 2025 by
yeahdongcn
Loading…
support bf16 dtype for optimizer states using precision-aware optimizer in TransformerEngine
#1390
opened Feb 8, 2025 by
XiaobingSuper
•
Draft
add moe_router_device_choice_method argument to choose method …
#1381
opened Feb 6, 2025 by
bzantium
Loading…
fix typo
stale
No activity in 60 days on issue or PR
#1352
opened Jan 10, 2025 by
Jintao-Huang
Loading…
fix param overwrite problem in saver_mcore
stale
No activity in 60 days on issue or PR
#1351
opened Jan 9, 2025 by
Force1ess
Loading…
fix bugs of data preprocessing with multiple json keys
stale
No activity in 60 days on issue or PR
#1337
opened Dec 25, 2024 by
junjzhang
Loading…
Previous Next
ProTip!
no:milestone will show everything without a milestone.