Add GEMM scheduleV2 #1772

umangyadav · 2025-03-11T12:08:51Z

This adds 4 stage gemm schedule with GlobalRead, LDSWrite, LDSRead, MFMA stages.
Those 4 stages will get loop pipelined to run in parallel with initiation interval = 1.
4 stage code is taken and modified from #1511

By default this GemmSchedule is not enabled for tuning unless user has explictly asked for ExhaustiveTune.

TODO: Add E2E and unit-tests

mirza-halilcevic · 2025-03-11T12:37:07Z

mlir/lib/Dialect/Rock/Transforms/GridwiseGemmToBlockwise.cpp

+      const std::unique_ptr<rock::accel::AccelEmitter> &accelEmitterPtr,
+      Value regsA, Value regsB, Value regsC, StringAttr arch,
+      GemmFeaturesAttr features,
+      const RockAccelTuningParamAttrInterface tuningParams) const {


Is there a reason for passing tuningParams by value and not by reference?

Changed that to const ref. THanks

mlir/lib/Dialect/Rock/Transforms/GridwiseGemmToBlockwise.cpp

mlir/lib/Dialect/Rock/Tuning/RockTuningImpl.cpp

mlir/lib/Dialect/Rock/Transforms/GridwiseGemmToBlockwise.cpp

codecov · 2025-03-11T15:30:20Z

Codecov Report

Attention: Patch coverage is 29.88506% with 122 lines in your changes missing coverage. Please review.

Project coverage is 78.57%. Comparing base (b16004d) to head (73da73e).
Report is 3 commits behind head on develop.

Files with missing lines	Patch %	Lines
...ialect/Rock/Transforms/GridwiseGemmToBlockwise.cpp	23.30%	100 Missing and 2 partials ⚠️
mlir/lib/Dialect/Rock/Tuning/RockTuningImpl.cpp	51.21%	17 Missing and 3 partials ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1772      +/-   ##
===========================================
- Coverage    78.84%   78.57%   -0.28%     
===========================================
  Files          102      102              
  Lines        29354    29479     +125     
  Branches      4389     4394       +5     
===========================================
+ Hits         23144    23162      +18     
- Misses        4430     4528      +98     
- Partials      1780     1789       +9

Flag	Coverage Δ
mfma	`78.57% <29.88%> (-0.28%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dhernandez0 · 2025-03-11T15:40:27Z

please, make changes so that all CI tasks run full tuning instead of exhaustive tuning as well. Otherwise, we'll 2x our tuning time in CI.

dhernandez0

Some missing things:

I still see some v2: in the codebase, for example in perfRunner.py, quickTuningGen.py and perfRegressionReport.py
missing a test like gridwise_gemm_accel_lowering.mlir that checks the IR looks correct for the new soft pipeline strategy
it'd be nice to have a table comparing the results vs develop when doing exhaustive tuning. But that can be done after it's merged. As we've seen some improvements already.

dhernandez0 · 2025-03-12T08:55:01Z

mlir/test/e2e/GemmVariants.toml

instead of hardcoding this we could use

[[axis]]
name = "perfConfig"
values = ["", "v3:64,128,4,16,16,8,1,2,2,1,1"]
prefix = "--perf_config="

So, whenever we add another scheduling it's just adding an extra string.

Same for all other tests.

dhernandez0 · 2025-03-12T08:58:24Z

mlir/utils/performance/parameterSweeps.py

-    range(0, 1)
+    range(0, 1),
+    # GEMM Schedule Version
+    range(1, 3)


parameterSweep is slow, 5hours I think, this would make it 10. Are we sure we want to do it?

dhernandez0 · 2025-03-12T08:59:23Z

mlir/utils/performance/tuningRunner.py

@@ -301,7 +301,7 @@ def main(args=None):

    parser.add_argument(
        "--tuning-space",
-        default="exhaustive",
+        default="full",


seems reasonable to do full by default. But let's make sure everyone knows about this when we merge it. Let's send a message to the rocmlir team chat.

dhernandez0 · 2025-03-12T09:02:00Z

mlir/utils/performance/tuningRunner.py

can you change the default in rocmlir-tuning-driver.cpp as well?

umangyadav added 3 commits March 10, 2025 18:17

add gridwiseGemmToBlockwise

2983378

add tuning over gemm schedules

9eb6a64

add tuning over gemm schedules

73da73e

umangyadav self-assigned this Mar 11, 2025

umangyadav requested review from dhernandez0, stefankoncarevic, djramic, mirza-halilcevic and dorde-antic March 11, 2025 12:09

mirza-halilcevic reviewed Mar 11, 2025

View reviewed changes