Releases: JeffreyXiang/FlexGEMM
Releases · JeffreyXiang/FlexGEMM
FlexGEMM v1.0.0
🎉 Initial Stable Release
FlexGEMM is a cross-platform backend for high-performance sparse convolutions, providing optimized Triton kernels with flexible autotuning for sparse 3D workloads.
This is the first stable public release of FlexGEMM.
Core Features
- Efficient sparse submanifold convolution (forward & backward)
- Grid sample with nearest or trilinear interpolation for sparse 3D tensors
Platform & Environment Support
-
OS
- Linux
- Windows
-
Python
- Python ≥ 3.8
-
Dependencies
- PyTorch ≥ 2.4.0
- Triton ≥ 3.2.0 (Linux)
- triton-windows ≥ 3.2.0 (Windows)
Tests & Benchmarks
-
Comprehensive test coverage for:
- Sparse convolution (forward / backward)
- Grid sample ops
- Hash map & neighbor cache
-
Benchmark scripts included for training & inference scenarios
Known Issues & Notes
- This is the first stable release; while core APIs and functionality are stable, performance tuning and edge-case fixes will continue in upcoming
1.0.xreleases. - Autotune cache behavior may evolve as more workloads are covered.
- Users are encouraged to report platform-specific issues, especially on Windows.
Acknowledgements
We thank @PozzettiAndrea, @geekuillaume, and @DQSSSSS for their contributions to cross-platform support, autotuning infrastructure, and uint64 hashing in FlexGEMM.