Skip to content

Comments

Support for candidate generation and tuning attention#2743

Open
keshavvinayak01 wants to merge 13 commits intomainfrom
users/keshavvinayak01/benchmarking-boo-sdpa
Open

Support for candidate generation and tuning attention#2743
keshavvinayak01 wants to merge 13 commits intomainfrom
users/keshavvinayak01/benchmarking-boo-sdpa

Conversation

@keshavvinayak01
Copy link

No description provided.

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 22, 2025

Coverage report

This PR does not seem to contain any modification to coverable code.

Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
@keshavvinayak01 keshavvinayak01 changed the title Draft work. [WIP] Support for candidate generation and tuning attention Jan 16, 2026
Copy link
Author

@keshavvinayak01 keshavvinayak01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bangtianliu It's a WIP, but if you could go through the changes and see if they make sense.

All these changes were required to make the boo_tuner work with attention, as well as generate more and better candidates for benchmarking.

@Groverkss
Copy link
Contributor

Nice! I like the changes. Will redirect to @bangtianliu for review.

Comment on lines 899 to 907
# For attention ops, use VectorDistribute pipeline instead of TileAndFuse
if dispatch_tuner.get_dispatch_kind() == common.DispatchKind.attention:
if args.codegen_pipeline != CodegenPipelines.llvmgpu_vector_distribute:
logging.info(
f"Attention operation detected. Overriding codegen pipeline "
f"from {args.codegen_pipeline} to llvmgpu_vector_distribute"
)
args.codegen_pipeline = CodegenPipelines.llvmgpu_vector_distribute

Copy link
Contributor

@bangtianliu bangtianliu Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is unnecessary. Since we already have code to handle this case

if (
dispatch_kind != common.DispatchKind.attention
or codegen_pipeline
!= iree_codegen.DispatchLoweringPassPipeline.LLVMGPUVectorDistribute
):
return []

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The referenced code acts like a guard instead of returning solutions. We should either change that code to overwrite vector-distribute for the attention case or keep my code.

@bangtianliu
Copy link
Contributor

Left some comments here.

Can do another pass once this PR is ready for review

@bangtianliu bangtianliu requested a review from kuhar January 16, 2026 16:50
@keshavvinayak01 keshavvinayak01 changed the title [WIP] Support for candidate generation and tuning attention [WIP] [Do not Review] Support for candidate generation and tuning attention Feb 6, 2026
Extract duplicated bf16/f16 with f32 accumulator compatibility logic from
common.py and dispatch_constraints.py into a shared helper function. Use
isinstance() for type comparison instead of str(). Revert benchmark log
level from info back to debug. Add test for the new helper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
@keshavvinayak01 keshavvinayak01 changed the title [WIP] [Do not Review] Support for candidate generation and tuning attention [WIP] Support for candidate generation and tuning attention Feb 18, 2026
keshavvinayak01 and others added 3 commits February 18, 2026 10:41
Fix denorm_flushing type to list[bool] with default [False] and use local
variable for attention pipeline selection instead of mutating args.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
@keshavvinayak01 keshavvinayak01 changed the title [WIP] Support for candidate generation and tuning attention Support for candidate generation and tuning attention Feb 18, 2026
keshavvinayak01 and others added 3 commits February 18, 2026 11:14
…xed accumulator type matching in MMA compatibility check

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
@keshavvinayak01 keshavvinayak01 marked this pull request as ready for review February 18, 2026 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants