Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Formally supporting some suite of "-O*" type flags #19072

Open
qedawkins opened this issue Nov 8, 2024 · 4 comments
Open

[RFC] Formally supporting some suite of "-O*" type flags #19072

qedawkins opened this issue Nov 8, 2024 · 4 comments
Labels
enhancement ➕ New feature or request

Comments

@qedawkins
Copy link
Contributor

Motivation

In the process of trying to land #18474 I ran into a number of correctness issues related to LLVM backends where the only saving grace was having a reference that gave correct numerics to compare against. I often found myself hacking in adjustments to the LLVM optimization level for comparison, or string parsing on dispatch names for finer grain control over various pipelines.

Moving forward we still have a fairly large suite of models with correctness issues across multiple backends: https://github.com/nod-ai/e2eshark-reports/blob/main/2024-11-07/ci_reports_onnx/rocm/combined-reports/summary.md + https://github.com/nod-ai/e2eshark-reports/blob/main/2024-11-07/ci_reports_onnx/llvm-cpu/combined-reports/summary.md

We need to find a way to be much better about numerics if we want consistent users of the compiler. Focusing on a few of the models that we care about for only the most optimized paths is not going to help get us there.

Currently we have a set of flags scattered throughout the code base that control various optimizations, most of which are leftover from when the underlying feature was under development. To name a few:

  • --iree-opt-aggressively-propagate-transposes
  • --iree-opt-outer-dim-concat
  • --iree-opt-data-tiling
  • --iree-opt-numeric-precision-reduction
  • --iree-dispatch-creation-enable-aggressive-fusion
  • --iree-scheduling-optimize-bindings
  • --iree-llvmcpu-disable-distribution
  • --iree-llvmcpu-disable-vector-peeling
  • --iree-codegen-llvmgpu-use-vector-distribution
  • --iree-codegen-llvmgpu-use-igemm

99% of this issue is focused at codegen (as that's where 99%+ of the numerics issues come from) but other phases (e.g. DispatchCreation) also includes a large space of optimization choices.

Proposal

What I'm proposing is:

  • Limit the proliferation of "developer flags." This includes being clearer about the naming when it is for enabling an in-progress feature that is intended to be turned on by default (i.e. -test- or -experimental- naming prefixes). I don't know what the opinion of llvm::cl::hidden is, but that is another option to avoid exposing every new/developer flag to the user. This cleanup will help with formalizing the compiler's APIs for controlling optimizations.
  • Expose [codegen] backend optimization level flags (mainly for LLVM based backends, I don't think there is any equivalent in SPIR-V) as compiler options (not floating flags)
  • Add support for true -O0 codegen pipelines that prioritize correctness and can handle any input + expose flags to enable them (perhaps a shared flag)
  • Add a mechanism to control codegen "opt" levels per dispatch
  • Do a coarse grouping of compiler flags that we can enable/disable all at once (similar to clang -O___). I don't know if our compiler is suited for global optimization levels, so this may make more sense per compilation phase.

I am not an expert in API design, this is just my intuition for the kind of reorganization that I think could help us make progress on the swath of correctness issues we have. Also there have been a number of improvements to the pass rate of the model suite listed above recently, and I believe almost all of those have been from frontend improvements that are papering over correctness issues that are still there.

@qedawkins qedawkins added the enhancement ➕ New feature or request label Nov 8, 2024
@benvanik
Copy link
Collaborator

benvanik commented Nov 8, 2024

Has been on the TODO list for awhile, would be nice to have.

In a multi-stage compiler we will always need a way to control each area (input/frontend level with asserts/etc, global tensor optimization, fusion, host codegen (stream/hal/vm/etc), device codegen) so having per-stage opt flags is the baseline. For most purposes like you're working through here a global flag across all stages will only help the first step of a binary search ("does literally anything in the compiler change anything anywhere") - useful, but rarer in workflows. The more we can also scope things the more effective the binary search is ("does anything in codegen change behavior yes/no, does anything in global opt change behavior yes/no" etc).

I'd like to see any top-level transformation pipeline have an opt flag and the compiler frontend propagate them consistently: --iree-stream-opt= --iree-hal-opt= --iree-flow-opt= --iree-codegen-llvmgpu-opt= etc (named after the pipeline names, like --iree-hal-transformation-ipeline -> -opt). Then we can have an --iree-opt= that sets those all. If those compiler driver flags just set the transform options for each pipeline we can still create reproducers and such that include the opt level. So concretely something like IREE::Stream::TransformOptions would have an Option<OptimizationLevel> option, the default options could be set based on --iree-stream-opt=, and the compiler driver would pass in its --iree-opt flag if that was set. A shared OptimizationLevel only really works at those high-level pipelines (as you may want finer down below) but that's ok - if a user wants to still pass --iree-llvmcpu-disable-distribution they can (though naming wise we should never include disable/enable in a flag name - a boolean flag true denotes enabled and false denotes disabled, and --foo-disable=true is weird).

@qedawkins
Copy link
Contributor Author

For most purposes like you're working through here a global flag across all stages will only help the first step of a binary search ("does literally anything in the compiler change anything anywhere") - useful, but rarer in workflows. The more we can also scope things the more effective the binary search is ("does anything in codegen change behavior yes/no, does anything in global opt change behavior yes/no" etc).

Yes, this binary search is exactly what I want us to be able to start doing. Today we have two main issues

  1. Our compiler is essentially -O3 by default always
  2. Even if you wanted to binary search through the flags that are there, there's no organization that makes it possible for anyone outside of a developer to do (and honestly I don't think there are even any developers of IREE that could accurately navigate all of the flags we have)

--iree-stream-opt= --iree-hal-opt= --iree-flow-opt= --iree-codegen-llvmgpu-opt= etc (named after the pipeline names, like --iree-hal-transformation-ipeline -> -opt). Then we can have an --iree-opt= that sets those all.

This sounds perfect to me. We can start by getting some of this structure in place and then figure out what each phase means in isolation (per the people who work on each one).

--iree-codegen-llvmgpu-opt=

Currently AFAICT top level compiler options are inaccessible to codegen directly, only to the target backends, so I was assuming we'd make LLVM[G|C]PUPopulateTranslationPassPipeline take the OptimizationLevel and then we get --iree-llvmcpu-codegen-opt=/--iree-rocm-codegen-opt=/--iree-cuda-codegen-opt=. That might clash with LLVM opt levels though and I'd want to have separate control over what our codegen does and what LLVM does. Maybe those could be --iree-*-llvm-opt.

(though naming wise we should never include disable/enable in a flag name - a boolean flag true denotes enabled and false denotes disabled, and --foo-disable=true is weird).

I wasn't really going to go after naming, but makes sense. I am not in the mood right now to go rename all of the flags though :P.

@MaheshRavishankar
Copy link
Contributor

THis would be great! Though I think everyone always just wants to use -O1000

@MaheshRavishankar
Copy link
Contributor

But without being tongue in cheek, lets start with -O1. I think it is more to manage some features that are ready on some backend but not on other backends for the dispatch region formation stuff. I keep one flag there (--iree-dispatch-creation-enable-aggressive-fusion=true). Do we need any of the RoCM flags? There are developer flags that can allow going down "conservative" compilation. But if these are flags that are just developer flags they dont qualify to be part of optimization flags. Basically anything you add to a list of "optimization level controlled flags" is a long-term feature support. If these flags dont fall into it, then they are purely developer flags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ➕ New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants