-
Notifications
You must be signed in to change notification settings - Fork 462
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[C][PyTorch] Remove deprecated
device_id
arg for multi tensor API
#1994
opened Jul 24, 2025 by
ksivaman
Loading…
6 of 13 tasks
[PyTorch] Optimize cudagraph static_grad_outputs reuse
#1992
opened Jul 24, 2025 by
buptzyb
Loading…
5 of 13 tasks
Fixed double buffering issue for assymetric layers
#1984
opened Jul 23, 2025 by
sanandaraj5597
Loading…
[JAX] Fixing GemmPrimitive partitioning rules to handle tensor-parallelism correctly for sequence-parallel inputs
2.6.0
#1980
opened Jul 22, 2025 by
denera
Loading…
8 of 13 tasks
[PyTorch] fix input_quantizer usage in Linear backward for save_original_input
#1978
opened Jul 22, 2025 by
hxbai
Loading…
8 of 13 tasks
Manually launch wgrad accumulation and reduce in backward_dw() instead of backward()
#1976
opened Jul 21, 2025 by
lhb8125
Loading…
2 of 13 tasks
[PyTorch] Enable generic QK norm support (+ RMSNorm/LayerNorm)
#1966
opened Jul 18, 2025 by
negvet
Loading…
7 of 13 tasks
[PyTorch][Mcore] Fix illegal memory access issue while using Mcore async checkpoint with fp8 tensorwise recipe
bug
Something isn't working
#1956
opened Jul 16, 2025 by
zhongbozhu
Loading…
13 tasks
[PyTorch] Refactor C++ quantizer infrastructure
#1952
opened Jul 15, 2025 by
timmoon10
Loading…
8 of 13 tasks
[PyTorch][FP8 CS] Remove the unnecessary torch reciprocal op in fp8 current scaling code path
performance
Performance issues
#1950
opened Jul 14, 2025 by
zhongbozhu
Loading…
13 tasks
[PyTorch] Support delay_wgrad_compute cudagraph
#1948
opened Jul 14, 2025 by
buptzyb
Loading…
2 of 13 tasks
[Minor] Update 1_getting_started.rst
documentation
Improvements or additions to documentation
#1947
opened Jul 13, 2025 by
dupeljan
Loading…
3 of 8 tasks
[JAX] Select cuDNN backend for normalization
#1946
opened Jul 11, 2025 by
phu0ngng
Loading…
13 tasks
[BUILD] Exclude ninja from required packages
#1932
opened Jul 7, 2025 by
phu0ngng
Loading…
5 of 13 tasks
[PyTorch] Fuse permute+pad and unpermute+unpad ops for FP8 optimization
#1921
opened Jul 3, 2025 by
xiaoxi-wangfj
Loading…
3 of 12 tasks
Fix import error when flash attention 3 is installed
#1913
opened Jun 30, 2025 by
HollowMan6
Loading…
7 of 13 tasks
[PyTorch debug] Improve precision debug tools performance
#1909
opened Jun 30, 2025 by
pggPL
Loading…
9 of 13 tasks
[PyTorch] Support FA3 MLA CP feature
#1907
opened Jun 28, 2025 by
zhujian19891203
Loading…
7 of 13 tasks
[PyTorch Debug] Support log fp8 tensor stats for blockwise recipe
#1905
opened Jun 27, 2025 by
lengerfulluse
Loading…
12 tasks
[Common] NVFP4 kernels
enhancement
New feature or request
#1904
opened Jun 27, 2025 by
Oleg-Goncharov
•
Draft
5 of 13 tasks
Previous Next
ProTip!
Follow long discussions with comments:>50.