Support for candidate generation and tuning attention#2743
Support for candidate generation and tuning attention#2743keshavvinayak01 wants to merge 13 commits intomainfrom
Conversation
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
keshavvinayak01
left a comment
There was a problem hiding this comment.
@bangtianliu It's a WIP, but if you could go through the changes and see if they make sense.
All these changes were required to make the boo_tuner work with attention, as well as generate more and better candidates for benchmarking.
|
Nice! I like the changes. Will redirect to @bangtianliu for review. |
| # For attention ops, use VectorDistribute pipeline instead of TileAndFuse | ||
| if dispatch_tuner.get_dispatch_kind() == common.DispatchKind.attention: | ||
| if args.codegen_pipeline != CodegenPipelines.llvmgpu_vector_distribute: | ||
| logging.info( | ||
| f"Attention operation detected. Overriding codegen pipeline " | ||
| f"from {args.codegen_pipeline} to llvmgpu_vector_distribute" | ||
| ) | ||
| args.codegen_pipeline = CodegenPipelines.llvmgpu_vector_distribute | ||
|
|
There was a problem hiding this comment.
This code is unnecessary. Since we already have code to handle this case
amd-shark-ai/amdsharktuner/amdsharktuner/constraint_generator.py
Lines 653 to 658 in cab3ead
There was a problem hiding this comment.
The referenced code acts like a guard instead of returning solutions. We should either change that code to overwrite vector-distribute for the attention case or keep my code.
|
Left some comments here. Can do another pass once this PR is ready for review |
Extract duplicated bf16/f16 with f32 accumulator compatibility logic from common.py and dispatch_constraints.py into a shared helper function. Use isinstance() for type comparison instead of str(). Revert benchmark log level from info back to debug. Add test for the new helper. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Fix denorm_flushing type to list[bool] with default [False] and use local variable for attention pipeline selection instead of mutating args. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
…xed accumulator type matching in MMA compatibility check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
No description provided.