-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GEMM scheduleV2 #1772
base: develop
Are you sure you want to change the base?
Add GEMM scheduleV2 #1772
Conversation
const std::unique_ptr<rock::accel::AccelEmitter> &accelEmitterPtr, | ||
Value regsA, Value regsB, Value regsC, StringAttr arch, | ||
GemmFeaturesAttr features, | ||
const RockAccelTuningParamAttrInterface tuningParams) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for passing tuningParams
by value and not by reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed that to const ref. THanks
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1772 +/- ##
===========================================
- Coverage 78.84% 78.57% -0.28%
===========================================
Files 102 102
Lines 29354 29479 +125
Branches 4389 4394 +5
===========================================
+ Hits 23144 23162 +18
- Misses 4430 4528 +98
- Partials 1780 1789 +9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
please, make changes so that all CI tasks run full tuning instead of exhaustive tuning as well. Otherwise, we'll 2x our tuning time in CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some missing things:
- I still see some v2: in the codebase, for example in perfRunner.py, quickTuningGen.py and perfRegressionReport.py
- missing a test like gridwise_gemm_accel_lowering.mlir that checks the IR looks correct for the new soft pipeline strategy
- it'd be nice to have a table comparing the results vs develop when doing exhaustive tuning. But that can be done after it's merged. As we've seen some improvements already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of hardcoding this we could use
[[axis]]
name = "perfConfig"
values = ["", "v3:64,128,4,16,16,8,1,2,2,1,1"]
prefix = "--perf_config="
So, whenever we add another scheduling it's just adding an extra string.
Same for all other tests.
range(0, 1) | ||
range(0, 1), | ||
# GEMM Schedule Version | ||
range(1, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameterSweep is slow, 5hours I think, this would make it 10. Are we sure we want to do it?
@@ -301,7 +301,7 @@ def main(args=None): | |||
|
|||
parser.add_argument( | |||
"--tuning-space", | |||
default="exhaustive", | |||
default="full", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems reasonable to do full by default. But let's make sure everyone knows about this when we merge it. Let's send a message to the rocmlir team chat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you change the default in rocmlir-tuning-driver.cpp as well?
This adds 4 stage gemm schedule with GlobalRead, LDSWrite, LDSRead, MFMA stages.
Those 4 stages will get loop pipelined to run in parallel with initiation interval = 1.
4 stage code is taken and modified from #1511
By default this GemmSchedule is not enabled for tuning unless user has explictly asked for ExhaustiveTune.
TODO: Add E2E and unit-tests