Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[JAX] Extend tensor inspect utility to dump out tensors in identifiable names
#3086 opened Jun 4, 2026 by tdophung Collaborator Loading…
6 of 13 tasks
[JAX] Fix norm workspace on global shapes
#3085 opened Jun 4, 2026 by jberchtold-nvidia Collaborator Draft
8 of 13 tasks
[JAX] MoEBlock tutorial
#3084 opened Jun 4, 2026 by jberchtold-nvidia Collaborator Draft
13 tasks
[JAX] Hopper BF16 grouped GEMM v2 support
#3083 opened Jun 4, 2026 by jberchtold-nvidia Collaborator Draft
8 of 13 tasks
add attention docs
#3081 opened Jun 4, 2026 by sudhakarsingh27 Member Draft
13 tasks
[PyTorch] Add joint forward-backward op fusion pass enhancement New feature or request
#3080 opened Jun 4, 2026 by timmoon10 Member Loading…
8 of 13 tasks
[Common] Pack attention arguments as structs
#3079 opened Jun 3, 2026 by cyanguwa Collaborator Draft
13 tasks
Enable NVFP4 grouped MLP GLU RHT amax path community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3073 opened Jun 1, 2026 by sraman-rgb Contributor Loading…
13 tasks
[Pytorch] Add variable-K Cutlass GroupGEMM for fine-grained MoE wgrad community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3069 opened Jun 1, 2026 by cassiewilliam Contributor Loading…
6 of 8 tasks
Optimize NVFP4 4over6 candidate error path community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3068 opened Jun 1, 2026 by zianglih Contributor Loading…
9 of 13 tasks
[PyTorch] Propagate skip_fp8_weight_update in GroupedLinear during FP8 CUDA graph capture community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3065 opened May 31, 2026 by LeSingh1 Contributor Loading…
fix unfused padding causal sdpa community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3063 opened May 31, 2026 by hungryGeek16 Loading…
[JAX] Grouped quant+GEMM custom partitioning rules
#3058 opened May 28, 2026 by jberchtold-nvidia Collaborator Loading…
8 of 13 tasks
[Common/PyTorch] bugfix: Token-linear fused RoPE impl. for THD tensors. community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3057 opened May 28, 2026 by plugyawn Loading…
7 of 13 tasks
[JAX] [PyT] [Common] Enable D=256 BWD cuDNN fused attn for Blackwell CC 10.x
#3056 opened May 28, 2026 by KshitijLakhani Collaborator Loading…
7 of 13 tasks
[PyTorch] [torch.compile] torch.compile support for Linear
#3053 opened May 28, 2026 by pggPL Collaborator Draft
13 tasks
[PyTorch] Propagate FP8 graph weight update flag in GroupedLinear community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3052 opened May 28, 2026 by allenphilipj Loading…
Feat/selective offload on srelu fuser community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3047 opened May 27, 2026 by lhb8125 Contributor Loading…
13 tasks
Add NVFP4 per-token quantization recipe community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3045 opened May 26, 2026 by cael-ling Contributor Draft
13 tasks
docs: expand comm gemm overlap guidance community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3043 opened May 26, 2026 by omribz156 Loading…
5 of 13 tasks
Use cuDNN for row-scaled NVFP4 grouped GEMM community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3042 opened May 26, 2026 by zianglih Contributor Draft
[PyTorch debug] FakeQuant: support Float8BlockScaling and fix MoE / w… community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#3040 opened May 25, 2026 by shangxiaokang Draft
13 tasks
[JAX] Expert Parallelism: JAX primitives + VJPs
#3036 opened May 22, 2026 by phu0ngng Collaborator Loading…
8 of 13 tasks
ProTip! Filter pull requests by the default branch with base:main.