-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Formally supporting some suite of "-O*" type flags #19072
Comments
Has been on the TODO list for awhile, would be nice to have. In a multi-stage compiler we will always need a way to control each area (input/frontend level with asserts/etc, global tensor optimization, fusion, host codegen (stream/hal/vm/etc), device codegen) so having per-stage opt flags is the baseline. For most purposes like you're working through here a global flag across all stages will only help the first step of a binary search ("does literally anything in the compiler change anything anywhere") - useful, but rarer in workflows. The more we can also scope things the more effective the binary search is ("does anything in codegen change behavior yes/no, does anything in global opt change behavior yes/no" etc). I'd like to see any top-level transformation pipeline have an opt flag and the compiler frontend propagate them consistently: |
Yes, this binary search is exactly what I want us to be able to start doing. Today we have two main issues
This sounds perfect to me. We can start by getting some of this structure in place and then figure out what each phase means in isolation (per the people who work on each one).
Currently AFAICT top level compiler options are inaccessible to codegen directly, only to the target backends, so I was assuming we'd make
I wasn't really going to go after naming, but makes sense. I am not in the mood right now to go rename all of the flags though :P. |
THis would be great! Though I think everyone always just wants to use |
But without being tongue in cheek, lets start with -O1. I think it is more to manage some features that are ready on some backend but not on other backends for the dispatch region formation stuff. I keep one flag there ( |
Motivation
In the process of trying to land #18474 I ran into a number of correctness issues related to LLVM backends where the only saving grace was having a reference that gave correct numerics to compare against. I often found myself hacking in adjustments to the LLVM optimization level for comparison, or string parsing on dispatch names for finer grain control over various pipelines.
Moving forward we still have a fairly large suite of models with correctness issues across multiple backends: https://github.com/nod-ai/e2eshark-reports/blob/main/2024-11-07/ci_reports_onnx/rocm/combined-reports/summary.md + https://github.com/nod-ai/e2eshark-reports/blob/main/2024-11-07/ci_reports_onnx/llvm-cpu/combined-reports/summary.md
We need to find a way to be much better about numerics if we want consistent users of the compiler. Focusing on a few of the models that we care about for only the most optimized paths is not going to help get us there.
Currently we have a set of flags scattered throughout the code base that control various optimizations, most of which are leftover from when the underlying feature was under development. To name a few:
--iree-opt-aggressively-propagate-transposes
--iree-opt-outer-dim-concat
--iree-opt-data-tiling
--iree-opt-numeric-precision-reduction
--iree-dispatch-creation-enable-aggressive-fusion
--iree-scheduling-optimize-bindings
--iree-llvmcpu-disable-distribution
--iree-llvmcpu-disable-vector-peeling
--iree-codegen-llvmgpu-use-vector-distribution
--iree-codegen-llvmgpu-use-igemm
99% of this issue is focused at codegen (as that's where 99%+ of the numerics issues come from) but other phases (e.g. DispatchCreation) also includes a large space of optimization choices.
Proposal
What I'm proposing is:
-test-
or-experimental-
naming prefixes). I don't know what the opinion ofllvm::cl::hidden
is, but that is another option to avoid exposing every new/developer flag to the user. This cleanup will help with formalizing the compiler's APIs for controlling optimizations.-O0
codegen pipelines that prioritize correctness and can handle any input + expose flags to enable them (perhaps a shared flag)I am not an expert in API design, this is just my intuition for the kind of reorganization that I think could help us make progress on the swath of correctness issues we have. Also there have been a number of improvements to the pass rate of the model suite listed above recently, and I believe almost all of those have been from frontend improvements that are papering over correctness issues that are still there.
The text was updated successfully, but these errors were encountered: