Skip to content

Releases: JeffreyXiang/FlexGEMM

FlexGEMM v1.0.0

13 Jan 07:22

Choose a tag to compare

🎉 Initial Stable Release

FlexGEMM is a cross-platform backend for high-performance sparse convolutions, providing optimized Triton kernels with flexible autotuning for sparse 3D workloads.

This is the first stable public release of FlexGEMM.

Core Features

  • Efficient sparse submanifold convolution (forward & backward)
  • Grid sample with nearest or trilinear interpolation for sparse 3D tensors

Platform & Environment Support

  • OS

    • Linux
    • Windows
  • Python

    • Python ≥ 3.8
  • Dependencies

    • PyTorch ≥ 2.4.0
    • Triton ≥ 3.2.0 (Linux)
    • triton-windows ≥ 3.2.0 (Windows)

Tests & Benchmarks

  • Comprehensive test coverage for:

    • Sparse convolution (forward / backward)
    • Grid sample ops
    • Hash map & neighbor cache
  • Benchmark scripts included for training & inference scenarios

Known Issues & Notes

  • This is the first stable release; while core APIs and functionality are stable, performance tuning and edge-case fixes will continue in upcoming 1.0.x releases.
  • Autotune cache behavior may evolve as more workloads are covered.
  • Users are encouraged to report platform-specific issues, especially on Windows.

Acknowledgements

We thank @PozzettiAndrea, @geekuillaume, and @DQSSSSS for their contributions to cross-platform support, autotuning infrastructure, and uint64 hashing in FlexGEMM.