Skip to content

v0.2.0

Latest

Choose a tag to compare

@LopezCastroRoberto LopezCastroRoberto released this 28 Oct 13:20
· 6 commits to main since this release

🚀 What is new in QuTLASS v0.2:

  • FlashInfer backend support for B200 GPUs
  • Quantization-Aware Training (QAT) via MXFP types:
    • Quartet clipping mask computation integrated in quantization routines
    • Prototype backward kernels for MXFP4 (sm_120) and MXFP8 (sm_100)
    • Integrated CUTLASS MXFP8 backward GEMM kernels (TN and NN layouts)
  • Updated Transformers Integration for QAT (#41897)
  • Nanochat-QAT Integration (#1)