-
Notifications
You must be signed in to change notification settings - Fork 401
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[PyTorch] Avoid unnecessary tensor usages when caching for linear op backward
bug
Something isn't working
#1676
opened Apr 11, 2025 by
timmoon10
Loading…
6 of 13 tasks
[JAX] Add collective GEMM without compute/communication overlap
#1675
opened Apr 11, 2025 by
philipphack
Loading…
1 of 6 tasks
[JAX] Improving the test_multiprocessing_encoder.py run script
#1673
opened Apr 11, 2025 by
phu0ngng
Loading…
6 of 12 tasks
[PyTorch] More precise test for the CPU offloading.
#1668
opened Apr 10, 2025 by
pggPL
Loading…
7 of 13 tasks
[QA] Add XML log generation for pytest results
#1661
opened Apr 9, 2025 by
linxiddd
Loading…
11 tasks
[JAX] grouped_gemm() uses variadic arguments
#1658
opened Apr 8, 2025 by
huanghua1994
Loading…
7 of 13 tasks
Split wgrad&dgrad from backward() to support a2a overlap
#1653
opened Apr 8, 2025 by
lhb8125
Loading…
1 of 13 tasks
[MoE] Support new recipes for permute_fusion
#1649
opened Apr 7, 2025 by
Autumn1998
Loading…
13 tasks
[PyTorch] check and try to generate fp8 weight transpose cache before dgrad backward
#1648
opened Apr 7, 2025 by
shjwudp
Loading…
13 tasks
Enable fp8 primary weights for sub-channel recipe
#1641
opened Apr 3, 2025 by
kunlunl
Loading…
7 of 13 tasks
Add adam bf16 state with original fp32 kernel
#1640
opened Apr 3, 2025 by
BestJuly
Loading…
1 of 13 tasks
Improved performance of mxfp8 cast kernels
performance
Performance issues
#1628
opened Mar 31, 2025 by
Oleg-Goncharov
Loading…
5 of 13 tasks
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 1 – core
#1614
opened Mar 25, 2025 by
pggPL
Loading…
7 tasks done
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 2 – features
#1613
opened Mar 25, 2025 by
pggPL
Loading…
7 tasks done
Previous Next
ProTip!
Updated in the last three days: updated:>2025-04-11.