v0.2.0

Latest

Latest

LopezCastroRoberto released this 28 Oct 13:20

· 6 commits to main since this release

83cc4fd

🚀 What is new in QuTLASS v0.2:

FlashInfer backend support for B200 GPUs
Quantization-Aware Training (QAT) via MXFP types:
- Quartet clipping mask computation integrated in quantization routines
- Prototype backward kernels for MXFP4 (sm_120) and MXFP8 (sm_100)
- Integrated CUTLASS MXFP8 backward GEMM kernels (TN and NN layouts)
Updated Transformers Integration for QAT (#41897)
Nanochat-QAT Integration (#1)

Assets 2