Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GEMM scheduleV2 #1772

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from
Open

Add GEMM scheduleV2 #1772

wants to merge 8 commits into from

Conversation

umangyadav
Copy link
Member

@umangyadav umangyadav commented Mar 11, 2025

This adds 4 stage gemm schedule with GlobalRead, LDSWrite, LDSRead, MFMA stages.
Those 4 stages will get loop pipelined to run in parallel with initiation interval = 1.
4 stage code is taken and modified from #1511

By default this GemmSchedule is not enabled for tuning unless user has explictly asked for ExhaustiveTune.

TODO: Add E2E and unit-tests

const std::unique_ptr<rock::accel::AccelEmitter> &accelEmitterPtr,
Value regsA, Value regsB, Value regsC, StringAttr arch,
GemmFeaturesAttr features,
const RockAccelTuningParamAttrInterface tuningParams) const {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for passing tuningParams by value and not by reference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed that to const ref. THanks

Copy link

codecov bot commented Mar 11, 2025

Codecov Report

Attention: Patch coverage is 29.88506% with 122 lines in your changes missing coverage. Please review.

Project coverage is 78.57%. Comparing base (b16004d) to head (73da73e).
Report is 3 commits behind head on develop.

Files with missing lines Patch % Lines
...ialect/Rock/Transforms/GridwiseGemmToBlockwise.cpp 23.30% 100 Missing and 2 partials ⚠️
mlir/lib/Dialect/Rock/Tuning/RockTuningImpl.cpp 51.21% 17 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1772      +/-   ##
===========================================
- Coverage    78.84%   78.57%   -0.28%     
===========================================
  Files          102      102              
  Lines        29354    29479     +125     
  Branches      4389     4394       +5     
===========================================
+ Hits         23144    23162      +18     
- Misses        4430     4528      +98     
- Partials      1780     1789       +9     
Flag Coverage Δ
mfma 78.57% <29.88%> (-0.28%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dhernandez0
Copy link
Contributor

dhernandez0 commented Mar 11, 2025

please, make changes so that all CI tasks run full tuning instead of exhaustive tuning as well. Otherwise, we'll 2x our tuning time in CI.

@umangyadav umangyadav marked this pull request as ready for review March 11, 2025 21:13
@umangyadav umangyadav requested a review from causten as a code owner March 11, 2025 21:13
Copy link
Contributor

@dhernandez0 dhernandez0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some missing things:

  • I still see some v2: in the codebase, for example in perfRunner.py, quickTuningGen.py and perfRegressionReport.py
  • missing a test like gridwise_gemm_accel_lowering.mlir that checks the IR looks correct for the new soft pipeline strategy
  • it'd be nice to have a table comparing the results vs develop when doing exhaustive tuning. But that can be done after it's merged. As we've seen some improvements already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of hardcoding this we could use

[[axis]]
name = "perfConfig"
values = ["", "v3:64,128,4,16,16,8,1,2,2,1,1"]
prefix = "--perf_config="

So, whenever we add another scheduling it's just adding an extra string.

Same for all other tests.

range(0, 1)
range(0, 1),
# GEMM Schedule Version
range(1, 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameterSweep is slow, 5hours I think, this would make it 10. Are we sure we want to do it?

@@ -301,7 +301,7 @@ def main(args=None):

parser.add_argument(
"--tuning-space",
default="exhaustive",
default="full",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable to do full by default. But let's make sure everyone knows about this when we merge it. Let's send a message to the rocmlir team chat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change the default in rocmlir-tuning-driver.cpp as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants