Commit ecac477
Merge remote-tracking branch 'nvidia/main' (#7)
* ci: Move test optimizer into its own bucket (NVIDIA#1909)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Use matrix for approval-bot
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Update function name
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Adjust approval-bot for copy-pr-bot
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Parametrize workflow
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Parametrize workflow
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Remove attribute
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Update container image tag to use GitHub SHA
* chore: Remove file
* ci: Fix approval bot
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Configure cherrypick bot (NVIDIA#1925)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Ci approve dev (NVIDIA#1933)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Update nightly schedule (NVIDIA#1934)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Bump pre-flight for runs on main/dev (NVIDIA#1935)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Allow skipping on main (NVIDIA#1936)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Ko3n1g/ci/pr template community bot (NVIDIA#1937)
* ci: More granular unit tests buckets (NVIDIA#1932)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Add sequence packing to RL (NVIDIA#1911)
Add sequence packing to RL
* chore: Update template (NVIDIA#1939)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* chore: Add description about who can merge (NVIDIA#1940)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Ko3n1g/ci/fix main on eos (NVIDIA#1938)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Ko3n1g/ci/internal mrs (NVIDIA#1942)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Fix branch of approval bot (NVIDIA#1944)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Approvalbot for other branches (NVIDIA#1947)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci(fix): Approval bot (NVIDIA#1949)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci(fix): Approval gate
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Approval gate rule
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Update golden values nightly
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Approval gate
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Approval bot
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Sync branches
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Smaller image
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Better output
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: sync branches
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Fix sync bot
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Finalize
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Finalize
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Ko3n1g/ci/sync branches (NVIDIA#1956)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Increase time limit for main tests
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Ko3n1g/ci/add milestone (NVIDIA#1951)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Remove M-FSDP testing under LTS environment (NVIDIA#1959)
* ci: Run on push to release branch (NVIDIA#1960)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Add golden values for inference
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Fix typo in rl section of CODEOWNERS (NVIDIA#1968)
* ci: Update copyright checker (NVIDIA#1973)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Ko3n1g/ci/auto reminder GitHub (NVIDIA#1955)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Update secret
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci(fix): `Run tests` label (NVIDIA#1970)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci(hotfix): Disable tests again
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci(hotfix): Add merge-group to copyright check
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci(hotfix): Copyright check on merge-queue
Signed-off-by: oliver könig <okoenig@nvidia.com>
* zarr soft deprecation (NVIDIA#2004)
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
* Make `get_asyncio_loop` safe to use repeatedly (NVIDIA#1990)
Co-authored-by: oliver könig <okoenig@nvidia.com>
* Update symmetric registration interface to sync-up with upstream pytorch change (NVIDIA#1924)
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Youngeun <kyeg9404@gmail.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
* chore: Update codeowners (NVIDIA#2012)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Deduplicate dynamic engine + coordinator. (NVIDIA#1981)
Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
* Safely access state dict args in load ckpt (NVIDIA#1957)
Signed-off-by: Maanu Grover <maanug@nvidia.com>
* Allow mixed-batch sampling in dynamic inference (NVIDIA#1927)
* Stop Nemo_CICD_Test from failing in forks (NVIDIA#2024)
* Clean up dynamic inference step (NVIDIA#1992)
Co-authored-by: Lawrence McAfee <lmcafee@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
* ci: Auto-update copy-pr-bot vetters (NVIDIA#1850)
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: AJ Schmidt <ajschmidt8@users.noreply.github.com>
* Have datasets account for tokenizers which incorrectly define PAD (NVIDIA#2017)
* ci: Enable integration tests (NVIDIA#2023)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci: Fix build-push-wheel workflow (NVIDIA#2022)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* chore: Update tooling for interactive jobs (NVIDIA#2032)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* revert(hotfix): ci: trustees_override (NVIDIA#2041)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* add missing warnings import in model parallel config (NVIDIA#2039)
Signed-off-by: ykarnati <ykarnati@nvidia.com>
* Reduce-scatter implementation with FP32 accumulation (NVIDIA#1967)
Signed-off-by: Deepak Narayanan <dnarayanan@nvidia.com>
* ci(fix): Workflows on `main` (NVIDIA#2045)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* build: Bump modelopt (NVIDIA#2046)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Remove TestCaptureFreezeGC unit test. (NVIDIA#1978)
* ci: Add multi-approval action (NVIDIA#2051)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci(hotfix): Repair codeowners file
* ci(hotfix): Set docs allowed to fail
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Ko3n1g/ci/test iteration time (NVIDIA#2067)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci(hotfix): Remove performance for ckpt-resume
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Allow inference test throughput to vary by 10% (NVIDIA#2070)
* ci(hotfix): Inference test pipeline
Signed-off-by: oliver könig <okoenig@nvidia.com>
* chore: Fix autoformatter (NVIDIA#2073)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci(hotfix): Remove iteration-time from t5
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci(hotfix): disable inference test
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci(hotfix): Disable inference test
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci(hotfix): Bypass approvalbot in merge-queue (NVIDIA#2082)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci(hotfix): Enable merge-group for approval bot
Signed-off-by: oliver könig <okoenig@nvidia.com>
* chore: Update local tooling (NVIDIA#2066)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Add extra RL files (NVIDIA#2077)
Co-authored-by: Robert Kirby <rkirby@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
* Prevent summary jobs from running in forks (NVIDIA#2083)
Co-authored-by: oliver könig <okoenig@nvidia.com>
* ci: Fix test scope (NVIDIA#2091)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* ci(hotfix): Remove publish workflows
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Refactor the attention metadata into separate classes (NVIDIA#2001)
Co-authored-by: Siddharth Singh <136645615+sidsingh-nvidia@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
* Guard against incorrectly using MoE prefill graphs (NVIDIA#2030)
Co-authored-by: oliver könig <okoenig@nvidia.com>
* Revert "Refactor the attention metadata into separate classes (NVIDIA#2001)"
This reverts commit a652e2c.
* Run mr-slim tests in lightweight-mode (NVIDIA#2106)
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
* Inference | Lazy compile UVM allocator. (NVIDIA#1977)
Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
* chore: Reenable trustees (NVIDIA#2108)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Revert "Inference | Lazy compile UVM allocator. (NVIDIA#1977)"
This reverts commit 7487c53.
* ci(fix): Changeset of copyright checker (NVIDIA#2110)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Ko3n1g/chore/update release settings (NVIDIA#2097)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Remove unnecessary check on rotary_pos_cos (NVIDIA#2003)
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
* (Reverted) Inference | Lazy compile UVM allocator. (NVIDIA#2125)
Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
* Refactor Attention Metadata to Separate Classes (NVIDIA#2112)
Co-authored-by: Siddharth Singh <136645615+sidsingh-nvidia@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
* Refactor model_provider to model_builder format for ModelOpt examples (NVIDIA#2107)
* wandb Inference stats logging (NVIDIA#2026)
Co-authored-by: root <root@gpu-h100-0058.cm.cluster>
Co-authored-by: William Dykas <wdykas@cw-pdx-cs-001-vscode-02.cm.cluster>
Co-authored-by: root <root@gpu-h100-0220.cm.cluster>
* Make `PipelineParallelLayout` always return str from ` __repr__` (NVIDIA#2055)
Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
* Add flash_attn_3 as first option for FA3 import (NVIDIA#2010)
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Teodor-Dumitru Ene <34819528+tdene@users.noreply.github.com>
* Add debugging hint for case when cudagraphs are created but no matching runner is found (NVIDIA#2129)
* ci: LTS container (NVIDIA#2133)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Revert "ci: LTS container (NVIDIA#2133)"
This reverts commit eb48e81.
* Fix param init (NVIDIA#2033)
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Hotfix to unit tests on hopper FA3 (NVIDIA#2143)
* Add BytesIO to safe_globals (NVIDIA#2074)
* add deprecation warning for legacy tokenizer system (NVIDIA#2145)
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* replay: ci: Bump LTS container (NVIDIA#2157)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Hotfix to unit tests on hopper FA3 (bis) (NVIDIA#2179)
* Fix has_modelopt_state() for native Torch checkpoint format (NVIDIA#2160)
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
* chore: Remove codeowners (NVIDIA#2175)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Fix FP8 inference with sequence parallelism (NVIDIA#2009)
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Co-authored-by: Teodor-Dumitru Ene <34819528+tdene@users.noreply.github.com>
* Replace ModelOpt generation server (NVIDIA#2147)
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
* Add hybrid model support for dynamic inference engine (NVIDIA#1907)
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Co-authored-by: Teodor-Dumitru Ene <tene@nvidia.com>
* Async task and event loop safety in Megatron Core (NVIDIA#2025)
Co-authored-by: Robert Kirby <ArEsKay3@users.noreply.github.com>
* Rename skip_prompt_log_probs (NVIDIA#2181)
* Dynamic inference context | UVM only. (NVIDIA#1983)
Co-authored-by: Robert Kirby <rkirby@nvidia.com>
Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
Co-authored-by: Teodor-Dumitru Ene <tene@nvidia.com>
* Update copy-pr-bot.yaml [skip ci]
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Revert "Dynamic inference context | UVM only. (NVIDIA#1983)"
This reverts commit d6979d6.
* ci: Run `auto-update-copy-pr-bot` only on forks (NVIDIA#2191)
Signed-off-by: oliver könig <okoenig@nvidia.com>
* Inference throughput tests: refactor goldens to be in list format (NVIDIA#2072)
* Enable TE custom quantization recipe (NVIDIA#2005)
Signed-off-by: Evgeny <etsykunov@nvidia.com>
Signed-off-by: root <Evgeny>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: root <Evgeny>
* Remove redundant logits calculations in gpt_model
---------
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Youngeun <kyeg9404@gmail.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: ykarnati <ykarnati@nvidia.com>
Signed-off-by: Deepak Narayanan <dnarayanan@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
Signed-off-by: Evgeny <etsykunov@nvidia.com>
Signed-off-by: root <Evgeny>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Teodor-Dumitru Ene <34819528+tdene@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com>
Co-authored-by: Lawrence McAfee <85179052+lmcafee-nvidia@users.noreply.github.com>
Co-authored-by: Mcore Bot <mcore-bot@nvidia.com>
Co-authored-by: Maanu Grover <109391026+maanug-nv@users.noreply.github.com>
Co-authored-by: Lawrence McAfee <lmcafee@nvidia.com>
Co-authored-by: AJ Schmidt <ajschmidt8@users.noreply.github.com>
Co-authored-by: Yashaswi Karnati <144376261+yashaswikarnati@users.noreply.github.com>
Co-authored-by: Deepak Narayanan <2724038+deepakn94@users.noreply.github.com>
Co-authored-by: helen ngo <helenn@nvidia.com>
Co-authored-by: Robert Kirby <rkirby@nvidia.com>
Co-authored-by: kanz-nv <kanz@nvidia.com>
Co-authored-by: Siddharth Singh <136645615+sidsingh-nvidia@users.noreply.github.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: Keshav Santhanam <ksanthanam@nvidia.com>
Co-authored-by: Asha Anoosheh <aanoosheh@nvidia.com>
Co-authored-by: wdykas <73254672+wdykas@users.noreply.github.com>
Co-authored-by: root <root@gpu-h100-0058.cm.cluster>
Co-authored-by: William Dykas <wdykas@cw-pdx-cs-001-vscode-02.cm.cluster>
Co-authored-by: root <root@gpu-h100-0220.cm.cluster>
Co-authored-by: Ananth Subramaniam <ansubramania@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Teodor-Dumitru Ene <tene@nvidia.com>
Co-authored-by: Robert Kirby <ArEsKay3@users.noreply.github.com>
Co-authored-by: Evgeny Tsykunov <e.tsykunov@gmail.com>1 parent 052780e commit ecac477
File tree
573 files changed
+57090
-111590
lines changed- .github
- actions
- workflows
- .gitlab/stages
- docker
- examples
- inference/gpt
- post_training/modelopt
- rl
- environment_configs
- environments
- countdown
- math
- megatron
- core
- datasets
- dist_checkpointing/strategies
- distributed
- fsdp/src
- megatron_fsdp
- inference
- contexts
- attention_context
- engines
- text_generation_controllers
- models
- gpt
- mamba
- ssm
- transformer
- moe
- post_training
- rl
- agent
- inference
- server
- agent
- inference
- training
- tokenizer
- tests
- functional_tests
- python_test_utils
- shell_test_utils
- test_cases
- bert
- bert_mcore_tp1_pp2
- bert_mcore_tp1_pp4_vp2
- bert_mcore_tp2_pp2_local_spec
- bert_mcore_tp2_pp2_resume_torch_dist_local_spec
- bert_mcore_tp2_pp2_resume_torch_dist
- bert_mcore_tp2_pp2
- bert_mcore_tp4_pp1
- bert_release
- gpt
- gpt3_15b_8t_release_sm
- gpt3_15b_8t_release
- gpt3_7b_tp1_pp4_memory_speed
- gpt3_7b_tp4_pp1_memory_speed
- gpt3_mcore_te_tp1_pp1_dist_optimizer_no_mmap_bin_files
- gpt3_mcore_te_tp1_pp1_resume_torch_dist_dist_optimizer
- gpt3_mcore_te_tp1_pp1_resume_torch_dist_uniform_full_recompute
- gpt3_mcore_te_tp1_pp2_resume_torch_dist_rope_embeddings_interleaved_no_fusion
- gpt3_mcore_te_tp1_pp2_resume_torch_dist_rope_embeddings
- gpt3_mcore_te_tp1_pp4_resume_torch_dist_disable_bias_linear
- gpt3_mcore_te_tp1_pp4_resume_torch_dist_persistent_disable_bias_linear
- gpt3_mcore_te_tp1_pp4_resume_torch_dist_swiglu
- gpt3_mcore_te_tp1_pp4_resume_torch_dist_untie_embeddings_and_outputs
- gpt3_mcore_te_tp1_pp4_vp1_dist_optimizer_overlap_grad_reduce_param_gather_overlap_optimizer
- gpt3_mcore_te_tp1_pp4_vp1_resume_torch_decoupled_lr
- gpt3_mcore_te_tp1_pp4_vp1_resume_torch_dist_calculate_per_token_loss
- gpt3_mcore_te_tp1_pp4_vp1_resume_torch_dist_dist_optimizer_overlap_grad_reduce_param_gather
- gpt3_mcore_te_tp1_pp4_vp1_resume_torch_dist_dist_optimizer_overlap_grad_reduce_untied
- gpt3_mcore_te_tp1_pp4_vp1_resume_torch_dist_dist_optimizer_overlap_grad_reduce
- gpt3_mcore_te_tp1_pp4_vp1_resume_torch_dist_tunable_overlap
- gpt3_mcore_te_tp1_pp4_vp1_tunable_overlap
- gpt3_mcore_te_tp1_pp4_vp1_uneven_pipeline
- gpt3_mcore_te_tp1_pp4_vp1
- gpt3_mcore_te_tp1_pp4_vp2_account_for_embedding_loss_in_pipeline_split
- gpt3_mcore_te_tp2_pp1_resume_torch_dist_cp2_nondeterministic
- gpt3_mcore_te_tp2_pp1_resume_torch_dist_multi_dist_optimizer_instances
- gpt3_mcore_te_tp2_pp2_cp2_calculate_per_token_loss_nondeterministic
- gpt3_mcore_te_tp2_pp2_cp2_calculate_per_token_loss
- gpt3_mcore_te_tp2_pp2_cp2_etp4_calculate_per_token_loss_dp_last
- gpt3_mcore_te_tp2_pp2_cp2_etp4_calculate_per_token_loss_nondeterministic_dp_last
- gpt3_mcore_te_tp2_pp2_cp2_etp4_dp_last
- gpt3_mcore_te_tp2_pp2_cp2_etp4_nondeterministic_dp_last
- gpt3_mcore_te_tp2_pp2_cp2_nondeterministic
- gpt3_mcore_te_tp2_pp2_cp2
- gpt3_mcore_te_tp2_pp2_cross_entropy_loss_fusion
- gpt3_mcore_te_tp2_pp2_mla
- gpt3_mcore_te_tp2_pp2_resume_torch_dist_cp2_nondeterministic
- gpt3_mcore_te_tp2_pp2_resume_torch_dist_cross_entropy_loss_fusion
- gpt3_mcore_te_tp2_pp2_resume_torch_dist_ddp_average_in_collective
- gpt3_mcore_te_tp2_pp2_resume_torch_dist_defer_embedding_wgrad_compute
- gpt3_mcore_te_tp2_pp2_resume_torch_dist_no_create_attention_mask_in_dataloader
- gpt3_mcore_te_tp2_pp2_resume_torch_dist_reshard_1x4xNone
- gpt3_mcore_te_tp2_pp2_resume_torch_dist
- gpt3_mcore_te_tp2_zp_z3_resume_fsdp_dtensor
- gpt3_mcore_te_tp4_pp1_dist_optimizer_overlap_grad_reduce_param_gather
- gpt3_mcore_te_tp4_pp1_resume_torch_dist_dist_optimizer_overlap_grad_reduce_param_gather
- gpt3_mcore_te_tp4_pp1_resume_torch_dist_dist_optimizer_overlap_grad_reduce
- gpt3_mcore_te_tp4_pp1_resume_torch_dist_qk_layernorm_test_mode
- gpt3_mcore_te_tp4_pp2_resume_torch_dist_reshard_8x1xNone
- gpt3_mcore_tp1_pp1_resume_torch_dist_dist_optimizer_overlap_grad_reduce_param_gather
- gpt3_mcore_tp1_pp2_resume_torch_dist
- gpt3_mcore_tp1_pp2
- gpt3_mcore_tp1_pp4_resume_torch_dist
- gpt3_mcore_tp1_pp4
- gpt3_mcore_tp4_pp1_resume_torch_dist
- gpt3_mcore_tp4_pp1_resume_torch
- gpt3_weekly_dgx_h100_mcore_tp2_pp2_current_scaling_native_fp8_tp_pp_sp_tp_overlap
- gpt3_weekly_dgx_h100_mcore_tp4_cp2_native_fp8_tp_sp_cp_tp_overlap
- gpt_dynamic_inference_tp1_pp1_583m_cuda_graphs_fp8_logitsmatch
- gpt_dynamic_inference_tp1_pp1_583m_cuda_graphs_logitsmatch_decode_graphs_only
- gpt_dynamic_inference_tp1_pp1_583m_logitsmatch
- gpt_dynamic_inference_tp8_pp1_583m_logitsmatch
- gpt_static_inference_tp1_pp1_16b_multiprompt_tokensmatch
- gpt_static_inference_tp1_pp1_583m_cudagraphs
- gpt_static_inference_tp1_pp1_583m_fp8_cudagraphs
- gpt_static_inference_tp1_pp1_583m_logitsmatch
- hybrid
- hybrid_mr_mcore_te_tp1_pp1_cp1_dgx_a100_1N8G
- hybrid_mr_mcore_te_tp2_pp1_cp1_dgx_a100_1N8G
- hybrid_mr_mcore_te_tp2_pp1_cp4_dgx_a100_1N8G
- hybrid_static_inference_tp1_pp1_2B_cudagraphs
- hybrid_static_inference_tp1_pp1_2B_logitsmatch
- mixtral
- deepseekv3_proxy_flex_tp1pp4emp16etp1cp1_release
- mixtral_8x22b_tp2pp8ep8vpp1_release
- mixtral_8x7b_alltoall_tp2pp4ep4_release_sm
- mixtral_8x7b_alltoall_tp2pp4ep4_release
- mixtral_8x7b_tp1pp4ep8vpp8_release
- moe
- gpt3_mcore_cp2_pp2_ep2_te_4experts2parallel_nondeterministic_dp_last
- gpt3_mcore_cp2_pp2_ep2_te_4experts2parallel_nondeterministic
- gpt3_mcore_te_tp1_pp2_resume_torch_dist_reshard_2x1x4_te_8experts2parallel_dist_optimizer
- gpt3_mcore_te_tp2_pp1_resume_torch_dist_te_8experts2parallel_dist_optimizer
- gpt3_mcore_te_tp2_pp1_resume_torch_dist_te_8experts2parallel_multi_dist_optimizer_instances
- gpt3_mcore_te_tp2_pp1_te_8experts2parallel_ddp_average_in_collective
- gpt3_mcore_te_tp2_pp1_te_8experts2parallel_overlap_grad_reduce_param_gather_groupedGEMM
- gpt3_mcore_te_tp2_pp1_te_8experts_etp1_ep4
- gpt3_mcore_te_tp2_pp1_te_a2a_ovlp_8experts_etp1_ep4
- gpt3_mcore_te_tp2_pp2_ep4_etp1_memory_speed
- gpt3_mcore_te_tp2_pp2_ep4_etp1_resume_torch_dist_attn_cudagraph
- gpt3_mcore_te_tp2_zp_z3_resume_torch_dist_te_8experts2parallel_top2router
- gpt3_mcore_tp2_cp2_pp2_ep2_te_4experts2parallel_dp_last
- gpt3_mcore_tp2_cp2_pp2_ep2_te_4experts2parallel
- gpt3_mcore_tp2_pp2_ep2_etp2_te_4experts2parallel_dp_last
- gpt3_mcore_tp2_pp2_ep2_etp2_te_4experts2parallel
- gpt3_mcore_tp2_pp2_ep2_te_4experts2parallel
- gpt3_moe_mcore_te_ep8_resume_torch_dist_dist_optimizer
- gpt3_moe_mcore_te_tp4_ep2_etp2_pp2_resume_torch_dist_dist_optimizer
- gpt_dynamic_inference_cuda_graphs_pad_tp4_pp1_ep4_16B_logitsmatch
- gpt_dynamic_inference_tp4_pp1_ep4_16B_logitsmatch
- gpt_static_inference_cuda_graphs_pad_tp4_pp1_ep4_16B_logitsmatch
- gpt_static_inference_tp1_pp1_ep1_16B_logitsmatch
- gpt_static_inference_tp4_pp1_ep4_16B_logitsmatch
- multimodal-llava
- multimodal_llava_mcore_te_tp1_pp1
- multimodal_llava_mcore_te_tp4_sp_cp2
- t5
- t5_11b_mcore_tp4_pp1
- t5_mcore_te_tp1_pp1_vp1_resume_torch
- t5_mcore_te_tp2_pp1_vp1_sequence_parallel
- t5_mcore_te_tp2_pp1_vp1
- t5_mcore_te_tp4_pp1_resume_torch_dist
- t5_mcore_te_tp4_pp1
- t5_mcore_tp1_pp1_vp1_resume_torch
- t5_mcore_tp1_pp1_vp1
- t5_mcore_tp2_pp1_vp1
- t5_mcore_tp4_pp1_resume_torch_dist
- t5_mcore_tp4_pp1
- t5_release
- test_utils
- python_scripts
- recipes
- unit_tests
- distributed
- inference
- contexts
- engines
- model_inference_wrappers/gpt
- text_generation_controllers
- transformer
- tools
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
573 files changed
+57090
-111590
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
9 | 15 | | |
10 | 16 | | |
11 | 17 | | |
12 | 18 | | |
13 | 19 | | |
14 | 20 | | |
15 | | - | |
| 21 | + | |
16 | 22 | | |
17 | 23 | | |
18 | 24 | | |
19 | 25 | | |
20 | 26 | | |
21 | 27 | | |
22 | 28 | | |
23 | | - | |
| 29 | + | |
24 | 30 | | |
25 | 31 | | |
26 | 32 | | |
27 | 33 | | |
28 | | - | |
| 34 | + | |
| 35 | + | |
29 | 36 | | |
30 | 37 | | |
31 | 38 | | |
| |||
44 | 51 | | |
45 | 52 | | |
46 | 53 | | |
47 | | - | |
| 54 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
81 | | - | |
| 81 | + | |
82 | 82 | | |
83 | 83 | | |
84 | 84 | | |
85 | | - | |
| 85 | + | |
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
92 | 92 | | |
93 | | - | |
| 93 | + | |
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
| |||
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
128 | 140 | | |
129 | 141 | | |
130 | 142 | | |
131 | | - | |
| 143 | + | |
132 | 144 | | |
133 | | - | |
| 145 | + | |
134 | 146 | | |
135 | 147 | | |
136 | 148 | | |
137 | 149 | | |
138 | 150 | | |
139 | 151 | | |
140 | | - | |
141 | 152 | | |
142 | 153 | | |
143 | 154 | | |
144 | | - | |
| 155 | + | |
145 | 156 | | |
146 | 157 | | |
147 | 158 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
0 commit comments