Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

[PyTorch] Avoid unnecessary tensor usages when caching for linear op backward bug Something isn't working
#1676 opened Apr 11, 2025 by timmoon10 Loading…
6 of 13 tasks
[JAX] Add collective GEMM without compute/communication overlap
#1675 opened Apr 11, 2025 by philipphack Loading…
1 of 6 tasks
Allow NVTEShape to own data.
#1674 opened Apr 11, 2025 by kwyss-nvidia Loading…
7 of 13 tasks
[JAX] Improving the test_multiprocessing_encoder.py run script
#1673 opened Apr 11, 2025 by phu0ngng Loading…
6 of 12 tasks
[PyTorch] More precise test for the CPU offloading.
#1668 opened Apr 10, 2025 by pggPL Loading…
7 of 13 tasks
[QA] Encapsulate functions in test_utils.sh
#1667 opened Apr 10, 2025 by linxiddd Loading…
11 tasks
[JAX] GroupedQuantizer and GroupedScaledTensor
#1666 opened Apr 10, 2025 by phu0ngng Draft
13 tasks
[JAX] Update helper tests
#1664 opened Apr 9, 2025 by jberchtold-nvidia Loading…
8 of 13 tasks
[PyTorch] Draft of new weight offloading
#1663 opened Apr 9, 2025 by pggPL Draft
7 of 13 tasks
add view/reshape to blockwise tensor
#1662 opened Apr 9, 2025 by Autumn1998 Draft
13 tasks
[QA] Add XML log generation for pytest results
#1661 opened Apr 9, 2025 by linxiddd Loading…
11 tasks
rtx5090 arch fix support
#1659 opened Apr 9, 2025 by sudhakarsingh27 Loading…
1 of 13 tasks
[JAX] grouped_gemm() uses variadic arguments
#1658 opened Apr 8, 2025 by huanghua1994 Loading…
7 of 13 tasks
Split wgrad&dgrad from backward() to support a2a overlap
#1653 opened Apr 8, 2025 by lhb8125 Loading…
1 of 13 tasks
[MoE] Support new recipes for permute_fusion
#1649 opened Apr 7, 2025 by Autumn1998 Loading…
13 tasks
[JAX] JAX Current Scaling
#1647 opened Apr 5, 2025 by jberchtold-nvidia Loading…
8 of 13 tasks
Add experimental Shardy support.
#1642 opened Apr 3, 2025 by jreiffers Loading…
1 of 6 tasks
Enable fp8 primary weights for sub-channel recipe
#1641 opened Apr 3, 2025 by kunlunl Loading…
7 of 13 tasks
Add adam bf16 state with original fp32 kernel
#1640 opened Apr 3, 2025 by BestJuly Loading…
1 of 13 tasks
Use internal quantizer in Linear module
#1638 opened Apr 3, 2025 by ptrendx Loading…
1 of 13 tasks
Symmetric memory all reduce
#1632 opened Apr 1, 2025 by wdykas Loading…
1 of 13 tasks
Improved performance of mxfp8 cast kernels performance Performance issues
#1628 opened Mar 31, 2025 by Oleg-Goncharov Loading…
5 of 13 tasks
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 1 – core
#1614 opened Mar 25, 2025 by pggPL Loading…
7 tasks done
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 2 – features
#1613 opened Mar 25, 2025 by pggPL Loading…
7 tasks done
ProTip! Updated in the last three days: updated:>2025-04-11.