Skip to content

Conversation

RaymondLi0
Copy link
Collaborator

No description provided.

@RaymondLi0 RaymondLi0 changed the base branch from multi-query-attention to before-merge June 20, 2023 20:12
@RaymondLi0 RaymondLi0 changed the base branch from before-merge to multi-query-attention June 20, 2023 20:12
Mcore Bot and others added 28 commits August 11, 2025 04:11
feat(MoE): support CP and recompute for MTP

See merge request ADLR/megatron-lm!3330
Disallow expandable segments for cudagraphs entirely

See merge request ADLR/megatron-lm!3806
MXFP8 DP AG overlap enablement

See merge request ADLR/megatron-lm!3710
Update README

See merge request ADLR/megatron-lm!3406
Move FullCudaGraphWrapper implementation to Megatron Core.

See merge request ADLR/megatron-lm!3808
Fixes and updates for external cudagraph

See merge request ADLR/megatron-lm!3631
build: Bump TE

See merge request ADLR/megatron-lm!3799
Debug distributed checkpoint for Transformer Engine fused MLP

See merge request ADLR/megatron-lm!3606
Add argument to control collnet enablement

See merge request ADLR/megatron-lm!3812
Dynamic Backend Inference MLA

See merge request ADLR/megatron-lm!3569
Adding support for multiple validation sets

See merge request ADLR/megatron-lm!3422
…m, and sequence parallelism for dynamic engine
Fix log prob calculation, pipeline parallelism, and sequence parallelism for dynamic engine

See merge request ADLR/megatron-lm!3718
deepakn94 and others added 30 commits September 9, 2025 20:59
Fix bug in param_norm computation where some ranks might call collective and some might not

See merge request ADLR/megatron-lm!3918
Fix BERT + virtual pipeline parallelism

See merge request ADLR/megatron-lm!3993
Dynamic inference functional tests | Cuda graphs.

See merge request ADLR/megatron-lm!3620
… training graph creation until create_cudagraphs
…main'

Create inference graphs immediately but defer training graph creation until create_cudagraphs

See merge request ADLR/megatron-lm!3965
Set mimo_vlm and gpt_dynamic_inference tests to be flaky

See merge request ADLR/megatron-lm!3995
Signed-off-by: oliver könig <[email protected]>
Signed-off-by: oliver könig <[email protected]>
Signed-off-by: oliver könig <[email protected]>
chore: Upgrade dependencies (2025-09-15)

See merge request ADLR/megatron-lm!3998
Co-authored-by: Jon Barker <[email protected]>
Co-authored-by: Robert Kirby <[email protected]>
Co-authored-by: Vitaly Kurin <[email protected]>
Co-authored-by: Helen Ngo <[email protected]>
Co-authored-by: Mcore Bot <[email protected]>
Co-authored-by: Keshav Santhanam <[email protected]>
Co-authored-by: Robert Kirby <[email protected]>
…fline implementation

Co-authored-by: Chenhan Yu <[email protected]>
Co-authored-by: Oliver Koenig <[email protected]>
Co-authored-by: Ye Yu <[email protected]>
Co-authored-by: Ye Yu <[email protected]>
Co-authored-by: Ye Yu <[email protected]>
… fully_shard_model and fully_shard_optimizer.

Co-authored-by: Mcore Bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.