About kernel fusion and customized GeMM #2546

ariesjzj · 2022-11-24T03:03:11Z

ariesjzj
Nov 24, 2022

As the doc DeepSpeed: Accelerating large-scale model inference and training via system optimizations and compression, DeepSpeed has deep fusion and inference-customized GeMM to improve performance for inference. However, from the source code (e.g. pt_bindings.cpp and ds_transformer_cuda.cpp), kernels seem not to be fused as said in DeepSpeed Inference: Multi-GPU inference with customized inference kernels and quantization support (e.g. "Input Layer-Norm plus Query, Key, and Value GeMMs and their bias adds.", "Intermediate FF, Layer-Norm, Bias-add, Residual, and Gaussian Error Linear Unit (GELU)"), and GeMM is delegated to cuBLAS. Does anybody know where to find the implementation of deeply fused kernels and customized GeMM kernels? Thanks~

dfyz · 2023-01-10T20:05:41Z

dfyz
Jan 10, 2023

There are some fused kernels in the repository, such as fused_bias_geglu, which fuses bias addition, GELU activation and gating. However, it does look like some of the more advanced fused kernels are not currently released to the public. Looking at the DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale paper (which appears to be a more detailed version of the blog post linked in this discussion), there are two interesting techniques that, as far as I can see, are not yet in master:

Deep-Fusion (from III.B), which automatically (?) fuses multiple kernels where it's possible.
SBI-GeMM (from III.C), a fusion-amenable GEMM which duplicates some work across SMs, but is still faster than regular cuBLAS GEMMs for small batch sizes.

If I didn't miss anything and they are indeed not released, do you think it would be possible to open-source them?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About kernel fusion and customized GeMM #2546

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

About kernel fusion and customized GeMM #2546

ariesjzj Nov 24, 2022

Replies: 1 comment

dfyz Jan 10, 2023

ariesjzj
Nov 24, 2022

dfyz
Jan 10, 2023