【Hackathon 9th No.86】autogen `MultiQueryDecoderAttention` template_instantiation -part #4383

ccsuzzh · 2025-10-13T13:47:01Z

auto generate MultiQueryAppendC4Attention template_instantiation into multiple cu file
auto generate MultiQueryAppendAttention template_instantiation into multiple cu file
auto generate MultiQueryDecoderAttention template_instantiation into multiple cu file
add -t to nvcc_compile_args

thread_num: 4
build_and_install_ops compile time: 03:53:29 -> 02:58:20

【Hackathon 9th】开源贡献个人挑战赛 Paddle#74773

paddle-bot · 2025-10-13T13:56:17Z

Thanks for your contribution!

ccsuzzh · 2025-10-14T01:35:06Z

/re-run all-failed

custom_ops/gpu_ops/append_attn/autogen_template_instantiation.py

custom_ops/gpu_ops/append_attn/template_config.json

Copilot

Pull Request Overview

This PR implements auto-generation of template instantiation files for multiple attention kernel types and adds parallel compilation support for NVCC. The primary goal is to improve build efficiency and maintainability by automating template instantiation generation that was previously done manually.

Key changes include:

Auto-generation of MultiQueryAppendC4Attention, MultiQueryAppendAttention, and MultiQueryDecoderAttention template instantiation files
Addition of -t flag to NVCC compile arguments for parallel compilation
Refactoring of existing template code into separate implementation files

Reviewed Changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
custom_ops/setup_ops.py	Adds parallel compilation support by including `-t` flag with worker thread count
custom_ops/gpu_ops/append_attn/template_config.json	Configuration file defining template generation parameters for attention kernels
custom_ops/gpu_ops/append_attn/autogen_template_instantiation.py	Universal template instantiator that generates files based on JSON configuration
Multiple template_instantiation/*.cu files	Removal of manually written template instantiation files (now auto-generated)
Multiple _impl.cuh and _kernel.h files	Refactored implementation files separating kernel definitions from instantiations

Copilot · 2025-10-14T08:04:42Z

custom_ops/gpu_ops/append_attn/multiquery_attention_c16_impl.cuh

+    const int num_chunks = div_up(max_seq_len, chunk_size);
+    dim3 grids(num_blocks_x_cpu, num_chunks, kv_num_heads);
+    dim3 blocks(32, num_warps);
+    if (num_chunks <= 0) {


The condition num_chunks <= 0 should be num_chunks <= 1 to match the logic used in other similar implementations. This prevents execution when there's only one chunk, which should use the no-split kernel path.

Suggested change

if (num_chunks <= 0) {

if (num_chunks <= 1) {

Copilot · 2025-10-14T08:04:43Z

custom_ops/gpu_ops/append_attn/multiquery_attention_c4_impl.cuh

+
+    dim3 grids(num_blocks_x_cpu, num_chunks, kv_num_heads);
+    dim3 blocks(32, num_warps);
+    if (num_chunks <= 0) {


The condition num_chunks <= 0 should be num_chunks <= 1 to match the logic used in other similar implementations. This prevents execution when there's only one chunk, which should use the no-split kernel path.

Suggested change

if (num_chunks <= 0) {

if (num_chunks <= 1) {

YuanRisheng · 2025-10-14T08:16:14Z

性能提升的情况也在描述里说一下

ccsuzzh · 2025-10-14T09:01:45Z

/re-run all-failed

YuanRisheng · 2025-10-14T09:11:28Z

好像ci今天由于paddle版本有点问题，可以重新提交代码触发重跑

ccsuzzh · 2025-10-14T09:12:53Z

好像ci今天由于paddle版本有点问题，可以重新提交代码触发重跑

好的

ccsuzzh · 2025-10-14T13:09:21Z

性能提升的情况也在描述里说一下

Done

ccsuzzh · 2025-10-15T06:29:34Z

/re-run all-failed

custom_ops/gpu_ops/append_attn/template_config.json

split MultiQueryDecoderAttention template_instantiation

b26fa94

ccsuzzh changed the title ~~【Hackathon 9th No.86】autogen MultiQueryDecoderAttention template_instantiation -part~~ 【Hackathon 9th No.86】autogen MultiQueryDecoderAttention template_instantiation -part Oct 13, 2025

paddle-bot bot added the contributor External developers label Oct 13, 2025

luotao1 mentioned this pull request Oct 14, 2025

【Hackathon 9th】开源贡献个人挑战赛 PaddlePaddle/Paddle#74773

Open

luotao1 added the PaddlePaddle Hackathon label Oct 14, 2025

luotao1 assigned luotao1 and YuanRisheng Oct 14, 2025

YuanRisheng reviewed Oct 14, 2025

View reviewed changes

custom_ops/gpu_ops/append_attn/autogen_template_instantiation.py Outdated Show resolved Hide resolved

custom_ops/gpu_ops/append_attn/template_config.json Show resolved Hide resolved

YuanRisheng requested a review from Copilot October 14, 2025 08:04

Copilot AI reviewed Oct 14, 2025

View reviewed changes

YuanRisheng closed this Oct 14, 2025

YuanRisheng reopened this Oct 14, 2025

update comment

e0c803f

CI

98365ae

ccsuzzh requested a review from YuanRisheng October 16, 2025 00:34

YuanRisheng approved these changes Oct 16, 2025

View reviewed changes

custom_ops/gpu_ops/append_attn/template_config.json Show resolved Hide resolved

YuanRisheng merged commit 6adfbe0 into PaddlePaddle:develop Oct 16, 2025
15 of 17 checks passed

ccsuzzh deleted the decode_attention_kernel branch October 16, 2025 09:22

【Hackathon 9th No.86】autogen MultiQueryDecoderAttention template_instantiation -part #4383

【Hackathon 9th No.86】autogen MultiQueryDecoderAttention template_instantiation -part #4383

Uh oh!

Conversation

ccsuzzh commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Oct 13, 2025

Uh oh!

ccsuzzh commented Oct 14, 2025

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

YuanRisheng commented Oct 14, 2025

Uh oh!

ccsuzzh commented Oct 14, 2025

Uh oh!

YuanRisheng commented Oct 14, 2025

Uh oh!

ccsuzzh commented Oct 14, 2025

Uh oh!

ccsuzzh commented Oct 14, 2025

Uh oh!

ccsuzzh commented Oct 15, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

【Hackathon 9th No.86】autogen `MultiQueryDecoderAttention` template_instantiation -part #4383

【Hackathon 9th No.86】autogen `MultiQueryDecoderAttention` template_instantiation -part #4383

ccsuzzh commented Oct 13, 2025 •

edited

Loading