Skip to content

v2.2 ROCm

Latest

Choose a tag to compare

@ipanfilo ipanfilo released this 27 Nov 18:59
867687a

What's Changed

  • Support math_sm_count for GEMM
  • Added drop in Triton replacement for layernorm, rmsnorm
  • Added Triton MXFP8 quantize/dequantize
  • Reduce fp8 weight transpose cache occupied
  • Switched to AOTriton 0.10c
  • Switched from CK to AITER
  • JAX 0.7 support
  • FlashAttn 2.8.0.post2 support
  • Add gfx950 as default target
  • Fix building on ROCm6.2
  • Fix faults with current scaling

Upstream release notes: https://github.com/NVIDIA/TransformerEngine/releases/tag/v2.2

Full Changelog: v2.1_rocm...v2.2_rocm