-
Notifications
You must be signed in to change notification settings - Fork 14.6k
Introduce -fexperimental-loop-fuse to clang and flang #142686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-flang-driver @llvm/pr-subscribers-clang Author: Sebastian Pop (sebpop) ChangesThis adds the flag -floop-fuse to the clang and flang drivers. Full diff: https://github.com/llvm/llvm-project/pull/142686.diff 15 Files Affected:
diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index aad4e107cbeb3..de40564bc280f 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -326,6 +326,7 @@ CODEGENOPT(TimeTrace , 1, 0) ///< Set when -ftime-trace is enabled.
VALUE_CODEGENOPT(TimeTraceGranularity, 32, 500) ///< Minimum time granularity (in microseconds),
///< traced by time profiler
CODEGENOPT(InterchangeLoops , 1, 0) ///< Run loop-interchange.
+CODEGENOPT(FuseLoops , 1, 0) ///< Run loop-fuse.
CODEGENOPT(UnrollLoops , 1, 0) ///< Control whether loops are unrolled.
CODEGENOPT(RerollLoops , 1, 0) ///< Control whether loops are rerolled.
CODEGENOPT(NoUseJumpTables , 1, 0) ///< Set when -fno-jump-tables is enabled.
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 5ca31c253ed8f..e5e2c0035e03f 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4189,6 +4189,10 @@ def floop_interchange : Flag<["-"], "floop-interchange">, Group<f_Group>,
HelpText<"Enable the loop interchange pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
def fno_loop_interchange: Flag<["-"], "fno-loop-interchange">, Group<f_Group>,
HelpText<"Disable the loop interchange pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
+def floop_fuse : Flag<["-"], "floop-fuse">, Group<f_Group>,
+ HelpText<"Enable the loop fuse pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
+def fno_loop_fuse: Flag<["-"], "fno-loop-fuse">, Group<f_Group>,
+ HelpText<"Disable the loop fuse pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
def funroll_loops : Flag<["-"], "funroll-loops">, Group<f_Group>,
HelpText<"Turn on loop unroller">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
def fno_unroll_loops : Flag<["-"], "fno-unroll-loops">, Group<f_Group>,
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index cd5fc48c4a22b..539e413e8c86e 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -898,6 +898,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
PipelineTuningOptions PTO;
PTO.LoopUnrolling = CodeGenOpts.UnrollLoops;
PTO.LoopInterchange = CodeGenOpts.InterchangeLoops;
+ PTO.LoopFuse = CodeGenOpts.FuseLoops;
// For historical reasons, loop interleaving is set to mirror setting for loop
// unrolling.
PTO.LoopInterleaving = CodeGenOpts.UnrollLoops;
@@ -1339,6 +1340,7 @@ runThinLTOBackend(CompilerInstance &CI, ModuleSummaryIndex *CombinedIndex,
Conf.SampleProfile = std::move(SampleProfile);
Conf.PTO.LoopUnrolling = CGOpts.UnrollLoops;
Conf.PTO.LoopInterchange = CGOpts.InterchangeLoops;
+ Conf.PTO.LoopFuse = CGOpts.FuseLoops;
// For historical reasons, loop interleaving is set to mirror setting for loop
// unrolling.
Conf.PTO.LoopInterleaving = CGOpts.UnrollLoops;
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 13842b8cc2870..f3713e12cb5ef 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7030,6 +7030,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
options::OPT_fno_unroll_loops);
Args.AddLastArg(CmdArgs, options::OPT_floop_interchange,
options::OPT_fno_loop_interchange);
+ Args.AddLastArg(CmdArgs, options::OPT_floop_fuse, options::OPT_fno_loop_fuse);
Args.AddLastArg(CmdArgs, options::OPT_fstrict_flex_arrays_EQ);
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 937ee09cac7cc..4217b48a56b05 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -3157,7 +3157,7 @@ void tools::handleVectorizeSLPArgs(const ArgList &Args,
void tools::handleInterchangeLoopsArgs(const ArgList &Args,
ArgStringList &CmdArgs) {
- // FIXME: instead of relying on shouldEnableVectorizerAtOLevel, we may want to
+ // FIXME: Instead of relying on shouldEnableVectorizerAtOLevel, we may want to
// implement a separate function to infer loop interchange from opt level.
// For now, enable loop-interchange at the same opt levels as loop-vectorize.
bool EnableInterchange = shouldEnableVectorizerAtOLevel(Args, false);
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp
index dcc46469df3e9..f34a58cab3ff3 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -151,6 +151,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
!stackArrays->getOption().matches(options::OPT_fno_stack_arrays))
CmdArgs.push_back("-fstack-arrays");
+ Args.AddLastArg(CmdArgs, options::OPT_floop_fuse, options::OPT_fno_loop_fuse);
+
handleInterchangeLoopsArgs(Args, CmdArgs);
handleVectorizeLoopsArgs(Args, CmdArgs);
handleVectorizeSLPArgs(Args, CmdArgs);
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index 2c02719121c73..5506e11f51491 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -1648,6 +1648,11 @@ void CompilerInvocationBase::GenerateCodeGenArgs(const CodeGenOptions &Opts,
else
GenerateArg(Consumer, OPT_fno_loop_interchange);
+ if (Opts.FuseLoops)
+ GenerateArg(Consumer, OPT_floop_fuse);
+ else
+ GenerateArg(Consumer, OPT_fno_loop_fuse);
+
if (!Opts.BinutilsVersion.empty())
GenerateArg(Consumer, OPT_fbinutils_version_EQ, Opts.BinutilsVersion);
@@ -1963,6 +1968,8 @@ bool CompilerInvocation::ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args,
(Opts.OptimizationLevel > 1));
Opts.InterchangeLoops =
Args.hasFlag(OPT_floop_interchange, OPT_fno_loop_interchange, false);
+ Opts.FuseLoops =
+ Args.hasFlag(OPT_floop_fuse, OPT_fno_loop_fuse, false);
Opts.BinutilsVersion =
std::string(Args.getLastArgValue(OPT_fbinutils_version_EQ));
diff --git a/clang/test/Driver/clang_f_opts.c b/clang/test/Driver/clang_f_opts.c
index ee7ded265769b..204d5e338190f 100644
--- a/clang/test/Driver/clang_f_opts.c
+++ b/clang/test/Driver/clang_f_opts.c
@@ -52,6 +52,13 @@
// CHECK-INTERCHANGE-LOOPS: "-floop-interchange"
// CHECK-NO-INTERCHANGE-LOOPS: "-fno-loop-interchange"
+// RUN: %clang -### -S -floop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-FUSE-LOOPS %s
+// RUN: %clang -### -S -fno-loop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-NO-FUSE-LOOPS %s
+// RUN: %clang -### -S -fno-loop-fuse -floop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-FUSE-LOOPS %s
+// RUN: %clang -### -S -floop-fuse -fno-loop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-NO-FUSE-LOOPS %s
+// CHECK-FUSE-LOOPS: "-floop-fuse"
+// CHECK-NO-FUSE-LOOPS: "-fno-loop-fuse"
+
// RUN: %clang -### -S -fprofile-sample-accurate %s 2>&1 | FileCheck -check-prefix=CHECK-PROFILE-SAMPLE-ACCURATE %s
// CHECK-PROFILE-SAMPLE-ACCURATE: "-fprofile-sample-accurate"
diff --git a/flang/docs/ReleaseNotes.md b/flang/docs/ReleaseNotes.md
index 36be369595ffd..33795b7d1a55a 100644
--- a/flang/docs/ReleaseNotes.md
+++ b/flang/docs/ReleaseNotes.md
@@ -34,6 +34,7 @@ page](https://llvm.org/releases/).
* -floop-interchange is now recognized by flang.
* -floop-interchange is enabled by default at -O2 and above.
+* -floop-fuse is now recognized by flang.
## Windows Support
diff --git a/flang/include/flang/Frontend/CodeGenOptions.def b/flang/include/flang/Frontend/CodeGenOptions.def
index a697872836569..85bec620b54ba 100644
--- a/flang/include/flang/Frontend/CodeGenOptions.def
+++ b/flang/include/flang/Frontend/CodeGenOptions.def
@@ -36,6 +36,7 @@ CODEGENOPT(StackArrays, 1, 0) ///< -fstack-arrays (enable the stack-arrays pass)
CODEGENOPT(VectorizeLoop, 1, 0) ///< Enable loop vectorization.
CODEGENOPT(VectorizeSLP, 1, 0) ///< Enable SLP vectorization.
CODEGENOPT(InterchangeLoops, 1, 0) ///< Enable loop interchange.
+CODEGENOPT(FuseLoops, 1, 0) ///< Enable loop fuse.
CODEGENOPT(LoopVersioning, 1, 0) ///< Enable loop versioning.
CODEGENOPT(UnrollLoops, 1, 0) ///< Enable loop unrolling
CODEGENOPT(AliasAnalysis, 1, 0) ///< Enable alias analysis pass
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index 90a002929eff0..c2b63819e9f77 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -273,6 +273,9 @@ static void parseCodeGenArgs(Fortran::frontend::CodeGenOptions &opts,
if (args.getLastArg(clang::driver::options::OPT_floop_interchange))
opts.InterchangeLoops = 1;
+ if (args.getLastArg(clang::driver::options::OPT_floop_fuse))
+ opts.FuseLoops = 1;
+
if (args.getLastArg(clang::driver::options::OPT_vectorize_loops))
opts.VectorizeLoop = 1;
diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp
index 012d0fdfe645f..dc56cbe5c85bc 100644
--- a/flang/lib/Frontend/FrontendActions.cpp
+++ b/flang/lib/Frontend/FrontendActions.cpp
@@ -925,6 +925,7 @@ void CodeGenAction::runOptimizationPipeline(llvm::raw_pwrite_stream &os) {
si.getTimePasses().setOutStream(ci.getTimingStreamLLVM());
pto.LoopUnrolling = opts.UnrollLoops;
pto.LoopInterchange = opts.InterchangeLoops;
+ pto.LoopFuse = opts.FuseLoops;
pto.LoopInterleaving = opts.UnrollLoops;
pto.LoopVectorization = opts.VectorizeLoop;
pto.SLPVectorization = opts.VectorizeSLP;
diff --git a/flang/test/Driver/loop-fuse.f90 b/flang/test/Driver/loop-fuse.f90
new file mode 100644
index 0000000000000..240d00fdb62d7
--- /dev/null
+++ b/flang/test/Driver/loop-fuse.f90
@@ -0,0 +1,17 @@
+! RUN: %flang -### -S -floop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-FUSE %s
+! RUN: %flang -### -S -fno-loop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O0 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O1 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O2 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O3 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -Os %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -Oz %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! CHECK-LOOP-FUSE: "-floop-fuse"
+! CHECK-NO-LOOP-FUSE-NOT: "-floop-fuse"
+! RUN: %flang_fc1 -emit-llvm -O2 -floop-fuse -mllvm -print-pipeline-passes -o /dev/null %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-FUSE-PASS %s
+! RUN: %flang_fc1 -emit-llvm -O2 -fno-loop-fuse -mllvm -print-pipeline-passes -o /dev/null %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE-PASS %s
+! CHECK-LOOP-FUSE-PASS: loop-fusion
+! CHECK-NO-LOOP-FUSE-PASS-NOT: loop-fusion
+
+program test
+end program
diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h
index 51ccaa53447d7..287a57c7d4a7d 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -64,6 +64,9 @@ class PipelineTuningOptions {
/// false.
bool LoopInterchange;
+ /// Tuning option to enable/disable loop fuse. Its default value is false.
+ bool LoopFuse;
+
/// Tuning option to forget all SCEV loops in LoopUnroll. Its default value
/// is that of the flag: `-forget-scev-loop-unroll`.
bool ForgetAllSCEVInLoopUnroll;
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index f3654600c5abb..aaf43a9b535e9 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -104,6 +104,7 @@
#include "llvm/Transforms/Scalar/LoopDeletion.h"
#include "llvm/Transforms/Scalar/LoopDistribute.h"
#include "llvm/Transforms/Scalar/LoopFlatten.h"
+#include "llvm/Transforms/Scalar/LoopFuse.h"
#include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"
#include "llvm/Transforms/Scalar/LoopInstSimplify.h"
#include "llvm/Transforms/Scalar/LoopInterchange.h"
@@ -205,6 +206,10 @@ static cl::opt<bool>
EnableLoopInterchange("enable-loopinterchange", cl::init(false), cl::Hidden,
cl::desc("Enable the LoopInterchange Pass"));
+static cl::opt<bool> EnableLoopFuse("enable-loopfuse", cl::init(false),
+ cl::Hidden,
+ cl::desc("Enable the LoopFuse Pass"));
+
static cl::opt<bool> EnableUnrollAndJam("enable-unroll-and-jam",
cl::init(false), cl::Hidden,
cl::desc("Enable Unroll And Jam Pass"));
@@ -314,6 +319,7 @@ PipelineTuningOptions::PipelineTuningOptions() {
SLPVectorization = false;
LoopUnrolling = true;
LoopInterchange = EnableLoopInterchange;
+ LoopFuse = EnableLoopFuse;
ForgetAllSCEVInLoopUnroll = ForgetSCEVInLoopUnroll;
LicmMssaOptCap = SetLicmMssaOptCap;
LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap;
@@ -518,6 +524,9 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
+ if (PTO.LoopFuse)
+ FPM.addPass(LoopFusePass());
+
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
/*UseMemorySSA=*/true,
/*UseBlockFrequencyInfo=*/true));
@@ -709,6 +718,9 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
+ if (PTO.LoopFuse)
+ FPM.addPass(LoopFusePass());
+
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
/*UseMemorySSA=*/true,
/*UseBlockFrequencyInfo=*/true));
@@ -2112,7 +2124,6 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
LPM.addPass(LoopFlattenPass());
LPM.addPass(IndVarSimplifyPass());
LPM.addPass(LoopDeletionPass());
- // FIXME: Add loop interchange.
// Unroll small loops and perform peeling.
LPM.addPass(LoopFullUnrollPass(Level.getSpeedupLevel(),
|
@llvm/pr-subscribers-clang-driver Author: Sebastian Pop (sebpop) ChangesThis adds the flag -floop-fuse to the clang and flang drivers. Full diff: https://github.com/llvm/llvm-project/pull/142686.diff 15 Files Affected:
diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index aad4e107cbeb3..de40564bc280f 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -326,6 +326,7 @@ CODEGENOPT(TimeTrace , 1, 0) ///< Set when -ftime-trace is enabled.
VALUE_CODEGENOPT(TimeTraceGranularity, 32, 500) ///< Minimum time granularity (in microseconds),
///< traced by time profiler
CODEGENOPT(InterchangeLoops , 1, 0) ///< Run loop-interchange.
+CODEGENOPT(FuseLoops , 1, 0) ///< Run loop-fuse.
CODEGENOPT(UnrollLoops , 1, 0) ///< Control whether loops are unrolled.
CODEGENOPT(RerollLoops , 1, 0) ///< Control whether loops are rerolled.
CODEGENOPT(NoUseJumpTables , 1, 0) ///< Set when -fno-jump-tables is enabled.
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 5ca31c253ed8f..e5e2c0035e03f 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -4189,6 +4189,10 @@ def floop_interchange : Flag<["-"], "floop-interchange">, Group<f_Group>,
HelpText<"Enable the loop interchange pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
def fno_loop_interchange: Flag<["-"], "fno-loop-interchange">, Group<f_Group>,
HelpText<"Disable the loop interchange pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
+def floop_fuse : Flag<["-"], "floop-fuse">, Group<f_Group>,
+ HelpText<"Enable the loop fuse pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
+def fno_loop_fuse: Flag<["-"], "fno-loop-fuse">, Group<f_Group>,
+ HelpText<"Disable the loop fuse pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
def funroll_loops : Flag<["-"], "funroll-loops">, Group<f_Group>,
HelpText<"Turn on loop unroller">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
def fno_unroll_loops : Flag<["-"], "fno-unroll-loops">, Group<f_Group>,
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index cd5fc48c4a22b..539e413e8c86e 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -898,6 +898,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
PipelineTuningOptions PTO;
PTO.LoopUnrolling = CodeGenOpts.UnrollLoops;
PTO.LoopInterchange = CodeGenOpts.InterchangeLoops;
+ PTO.LoopFuse = CodeGenOpts.FuseLoops;
// For historical reasons, loop interleaving is set to mirror setting for loop
// unrolling.
PTO.LoopInterleaving = CodeGenOpts.UnrollLoops;
@@ -1339,6 +1340,7 @@ runThinLTOBackend(CompilerInstance &CI, ModuleSummaryIndex *CombinedIndex,
Conf.SampleProfile = std::move(SampleProfile);
Conf.PTO.LoopUnrolling = CGOpts.UnrollLoops;
Conf.PTO.LoopInterchange = CGOpts.InterchangeLoops;
+ Conf.PTO.LoopFuse = CGOpts.FuseLoops;
// For historical reasons, loop interleaving is set to mirror setting for loop
// unrolling.
Conf.PTO.LoopInterleaving = CGOpts.UnrollLoops;
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 13842b8cc2870..f3713e12cb5ef 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7030,6 +7030,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
options::OPT_fno_unroll_loops);
Args.AddLastArg(CmdArgs, options::OPT_floop_interchange,
options::OPT_fno_loop_interchange);
+ Args.AddLastArg(CmdArgs, options::OPT_floop_fuse, options::OPT_fno_loop_fuse);
Args.AddLastArg(CmdArgs, options::OPT_fstrict_flex_arrays_EQ);
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 937ee09cac7cc..4217b48a56b05 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -3157,7 +3157,7 @@ void tools::handleVectorizeSLPArgs(const ArgList &Args,
void tools::handleInterchangeLoopsArgs(const ArgList &Args,
ArgStringList &CmdArgs) {
- // FIXME: instead of relying on shouldEnableVectorizerAtOLevel, we may want to
+ // FIXME: Instead of relying on shouldEnableVectorizerAtOLevel, we may want to
// implement a separate function to infer loop interchange from opt level.
// For now, enable loop-interchange at the same opt levels as loop-vectorize.
bool EnableInterchange = shouldEnableVectorizerAtOLevel(Args, false);
diff --git a/clang/lib/Driver/ToolChains/Flang.cpp b/clang/lib/Driver/ToolChains/Flang.cpp
index dcc46469df3e9..f34a58cab3ff3 100644
--- a/clang/lib/Driver/ToolChains/Flang.cpp
+++ b/clang/lib/Driver/ToolChains/Flang.cpp
@@ -151,6 +151,8 @@ void Flang::addCodegenOptions(const ArgList &Args,
!stackArrays->getOption().matches(options::OPT_fno_stack_arrays))
CmdArgs.push_back("-fstack-arrays");
+ Args.AddLastArg(CmdArgs, options::OPT_floop_fuse, options::OPT_fno_loop_fuse);
+
handleInterchangeLoopsArgs(Args, CmdArgs);
handleVectorizeLoopsArgs(Args, CmdArgs);
handleVectorizeSLPArgs(Args, CmdArgs);
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index 2c02719121c73..5506e11f51491 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -1648,6 +1648,11 @@ void CompilerInvocationBase::GenerateCodeGenArgs(const CodeGenOptions &Opts,
else
GenerateArg(Consumer, OPT_fno_loop_interchange);
+ if (Opts.FuseLoops)
+ GenerateArg(Consumer, OPT_floop_fuse);
+ else
+ GenerateArg(Consumer, OPT_fno_loop_fuse);
+
if (!Opts.BinutilsVersion.empty())
GenerateArg(Consumer, OPT_fbinutils_version_EQ, Opts.BinutilsVersion);
@@ -1963,6 +1968,8 @@ bool CompilerInvocation::ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args,
(Opts.OptimizationLevel > 1));
Opts.InterchangeLoops =
Args.hasFlag(OPT_floop_interchange, OPT_fno_loop_interchange, false);
+ Opts.FuseLoops =
+ Args.hasFlag(OPT_floop_fuse, OPT_fno_loop_fuse, false);
Opts.BinutilsVersion =
std::string(Args.getLastArgValue(OPT_fbinutils_version_EQ));
diff --git a/clang/test/Driver/clang_f_opts.c b/clang/test/Driver/clang_f_opts.c
index ee7ded265769b..204d5e338190f 100644
--- a/clang/test/Driver/clang_f_opts.c
+++ b/clang/test/Driver/clang_f_opts.c
@@ -52,6 +52,13 @@
// CHECK-INTERCHANGE-LOOPS: "-floop-interchange"
// CHECK-NO-INTERCHANGE-LOOPS: "-fno-loop-interchange"
+// RUN: %clang -### -S -floop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-FUSE-LOOPS %s
+// RUN: %clang -### -S -fno-loop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-NO-FUSE-LOOPS %s
+// RUN: %clang -### -S -fno-loop-fuse -floop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-FUSE-LOOPS %s
+// RUN: %clang -### -S -floop-fuse -fno-loop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-NO-FUSE-LOOPS %s
+// CHECK-FUSE-LOOPS: "-floop-fuse"
+// CHECK-NO-FUSE-LOOPS: "-fno-loop-fuse"
+
// RUN: %clang -### -S -fprofile-sample-accurate %s 2>&1 | FileCheck -check-prefix=CHECK-PROFILE-SAMPLE-ACCURATE %s
// CHECK-PROFILE-SAMPLE-ACCURATE: "-fprofile-sample-accurate"
diff --git a/flang/docs/ReleaseNotes.md b/flang/docs/ReleaseNotes.md
index 36be369595ffd..33795b7d1a55a 100644
--- a/flang/docs/ReleaseNotes.md
+++ b/flang/docs/ReleaseNotes.md
@@ -34,6 +34,7 @@ page](https://llvm.org/releases/).
* -floop-interchange is now recognized by flang.
* -floop-interchange is enabled by default at -O2 and above.
+* -floop-fuse is now recognized by flang.
## Windows Support
diff --git a/flang/include/flang/Frontend/CodeGenOptions.def b/flang/include/flang/Frontend/CodeGenOptions.def
index a697872836569..85bec620b54ba 100644
--- a/flang/include/flang/Frontend/CodeGenOptions.def
+++ b/flang/include/flang/Frontend/CodeGenOptions.def
@@ -36,6 +36,7 @@ CODEGENOPT(StackArrays, 1, 0) ///< -fstack-arrays (enable the stack-arrays pass)
CODEGENOPT(VectorizeLoop, 1, 0) ///< Enable loop vectorization.
CODEGENOPT(VectorizeSLP, 1, 0) ///< Enable SLP vectorization.
CODEGENOPT(InterchangeLoops, 1, 0) ///< Enable loop interchange.
+CODEGENOPT(FuseLoops, 1, 0) ///< Enable loop fuse.
CODEGENOPT(LoopVersioning, 1, 0) ///< Enable loop versioning.
CODEGENOPT(UnrollLoops, 1, 0) ///< Enable loop unrolling
CODEGENOPT(AliasAnalysis, 1, 0) ///< Enable alias analysis pass
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp
index 90a002929eff0..c2b63819e9f77 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -273,6 +273,9 @@ static void parseCodeGenArgs(Fortran::frontend::CodeGenOptions &opts,
if (args.getLastArg(clang::driver::options::OPT_floop_interchange))
opts.InterchangeLoops = 1;
+ if (args.getLastArg(clang::driver::options::OPT_floop_fuse))
+ opts.FuseLoops = 1;
+
if (args.getLastArg(clang::driver::options::OPT_vectorize_loops))
opts.VectorizeLoop = 1;
diff --git a/flang/lib/Frontend/FrontendActions.cpp b/flang/lib/Frontend/FrontendActions.cpp
index 012d0fdfe645f..dc56cbe5c85bc 100644
--- a/flang/lib/Frontend/FrontendActions.cpp
+++ b/flang/lib/Frontend/FrontendActions.cpp
@@ -925,6 +925,7 @@ void CodeGenAction::runOptimizationPipeline(llvm::raw_pwrite_stream &os) {
si.getTimePasses().setOutStream(ci.getTimingStreamLLVM());
pto.LoopUnrolling = opts.UnrollLoops;
pto.LoopInterchange = opts.InterchangeLoops;
+ pto.LoopFuse = opts.FuseLoops;
pto.LoopInterleaving = opts.UnrollLoops;
pto.LoopVectorization = opts.VectorizeLoop;
pto.SLPVectorization = opts.VectorizeSLP;
diff --git a/flang/test/Driver/loop-fuse.f90 b/flang/test/Driver/loop-fuse.f90
new file mode 100644
index 0000000000000..240d00fdb62d7
--- /dev/null
+++ b/flang/test/Driver/loop-fuse.f90
@@ -0,0 +1,17 @@
+! RUN: %flang -### -S -floop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-FUSE %s
+! RUN: %flang -### -S -fno-loop-fuse %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O0 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O1 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O2 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -O3 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -Os %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! RUN: %flang -### -S -Oz %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
+! CHECK-LOOP-FUSE: "-floop-fuse"
+! CHECK-NO-LOOP-FUSE-NOT: "-floop-fuse"
+! RUN: %flang_fc1 -emit-llvm -O2 -floop-fuse -mllvm -print-pipeline-passes -o /dev/null %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-FUSE-PASS %s
+! RUN: %flang_fc1 -emit-llvm -O2 -fno-loop-fuse -mllvm -print-pipeline-passes -o /dev/null %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE-PASS %s
+! CHECK-LOOP-FUSE-PASS: loop-fusion
+! CHECK-NO-LOOP-FUSE-PASS-NOT: loop-fusion
+
+program test
+end program
diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h
index 51ccaa53447d7..287a57c7d4a7d 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -64,6 +64,9 @@ class PipelineTuningOptions {
/// false.
bool LoopInterchange;
+ /// Tuning option to enable/disable loop fuse. Its default value is false.
+ bool LoopFuse;
+
/// Tuning option to forget all SCEV loops in LoopUnroll. Its default value
/// is that of the flag: `-forget-scev-loop-unroll`.
bool ForgetAllSCEVInLoopUnroll;
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index f3654600c5abb..aaf43a9b535e9 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -104,6 +104,7 @@
#include "llvm/Transforms/Scalar/LoopDeletion.h"
#include "llvm/Transforms/Scalar/LoopDistribute.h"
#include "llvm/Transforms/Scalar/LoopFlatten.h"
+#include "llvm/Transforms/Scalar/LoopFuse.h"
#include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"
#include "llvm/Transforms/Scalar/LoopInstSimplify.h"
#include "llvm/Transforms/Scalar/LoopInterchange.h"
@@ -205,6 +206,10 @@ static cl::opt<bool>
EnableLoopInterchange("enable-loopinterchange", cl::init(false), cl::Hidden,
cl::desc("Enable the LoopInterchange Pass"));
+static cl::opt<bool> EnableLoopFuse("enable-loopfuse", cl::init(false),
+ cl::Hidden,
+ cl::desc("Enable the LoopFuse Pass"));
+
static cl::opt<bool> EnableUnrollAndJam("enable-unroll-and-jam",
cl::init(false), cl::Hidden,
cl::desc("Enable Unroll And Jam Pass"));
@@ -314,6 +319,7 @@ PipelineTuningOptions::PipelineTuningOptions() {
SLPVectorization = false;
LoopUnrolling = true;
LoopInterchange = EnableLoopInterchange;
+ LoopFuse = EnableLoopFuse;
ForgetAllSCEVInLoopUnroll = ForgetSCEVInLoopUnroll;
LicmMssaOptCap = SetLicmMssaOptCap;
LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap;
@@ -518,6 +524,9 @@ PassBuilder::buildO1FunctionSimplificationPipeline(OptimizationLevel Level,
invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
+ if (PTO.LoopFuse)
+ FPM.addPass(LoopFusePass());
+
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
/*UseMemorySSA=*/true,
/*UseBlockFrequencyInfo=*/true));
@@ -709,6 +718,9 @@ PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,
invokeLoopOptimizerEndEPCallbacks(LPM2, Level);
+ if (PTO.LoopFuse)
+ FPM.addPass(LoopFusePass());
+
FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM1),
/*UseMemorySSA=*/true,
/*UseBlockFrequencyInfo=*/true));
@@ -2112,7 +2124,6 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
LPM.addPass(LoopFlattenPass());
LPM.addPass(IndVarSimplifyPass());
LPM.addPass(LoopDeletionPass());
- // FIXME: Add loop interchange.
// Unroll small loops and perform peeling.
LPM.addPass(LoopFullUnrollPass(Level.getSpeedupLevel(),
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
If the goal here is gcc compatibility, I'd suggest not hooking up the flag to anything; the existing LoopFusePass isn't used by anything and hasn't been touched in years, so it's very likely to have issues. |
The goal is to make it easier to turn on loop fusion and to collect and fix bug reports against loop fusion. Once we are happy with the stability of the pass the goal is to enable by default in flang at -O2 and above. GCC does not have an equivalent for -floop-fuse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flang changes LGTM.
nit: I am no clang expert but I wonder if you could use MarshallingInfoFlag in Options.td to avoid having to set CGOpts.FuseLoops manually for clang (this is not supported for flang unfortunately). If people more familiar with clang are happy with it how it is then that is fine by me.
Please wait for somebody else to review this from a clang/ and llvm/ perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should expose clang driver options for passes that are known to have significant issues.
If you want to add a cl::opt
flag to allow scheduling this in the pipeline and accessible for early testing via -mllvm
, that would be a different matter.
If we really want a clang driver option, then it should be named -fexperimental-fuse-loops
.
I think this is reasonable even for flang. |
flang/docs/ReleaseNotes.md
Outdated
@@ -34,6 +34,7 @@ page](https://llvm.org/releases/). | |||
|
|||
* -floop-interchange is now recognized by flang. | |||
* -floop-interchange is enabled by default at -O2 and above. | |||
* -fexperimental-fuse-loops is now recognized by flang. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make a similar entry in clang/docs/ReleaseNotes.md
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that this is useful for testing purposes.
If folks are happy with the 'experimental' prefix, let's do that, LGTM.
So, why wouldn't adding a |
@@ -2112,7 +2124,6 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level, | |||
LPM.addPass(LoopFlattenPass()); | |||
LPM.addPass(IndVarSimplifyPass()); | |||
LPM.addPass(LoopDeletionPass()); | |||
// FIXME: Add loop interchange. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this patch resolves this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (PTO.LoopFuse) | ||
FPM.addPass(LoopFusePass()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an appropriate place? I think this is an optimization pass rather than simplification one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved it to the function simplification pipeline. LoopFusion is a function pass. I am open to change the place if this doesn't feel alright.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoopFusion is a function pass
I don't think this implies that the pass should be part of the function simplification pipeline. Its position should be determined by what the pass actually does. The appropriate placement likely depends on what you're trying to achieve with it.
// FIXME: instead of relying on shouldEnableVectorizerAtOLevel, we may want to | ||
// FIXME: Instead of relying on shouldEnableVectorizerAtOLevel, we may want to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unrelated change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I am taking over this change. Trying to address review comments. |
You probably made a mistake with the rebase... |
c6867ba
to
df80f2d
Compare
Thanks. I think I fixed it. Can you please have a look again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be ok with using loop-fusion
instead of loop-fuse
? The former is a well-recognized name for the transformation. Since the other names of loop optimizations such as interchange stick to prevalent usage, it would be good if we could do the same for fusion. This isn't required, but may be more consistent. What do you think?
I have requested changes not for the renaming, but because clang-format
needs to be run in a couple of places.
def fexperimental_loop_fuse : Flag<["-"], "fexperimental-loop-fuse">, Group<f_Group>, | ||
HelpText<"Enable the experimental loop fuse pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>; | ||
def fno_experimental_loop_fuse: Flag<["-"], "fno-experimental-loop-fuse">, Group<f_Group>, | ||
HelpText<"Disable the experimental loop fuse pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT
def fexperimental_loop_fuse : Flag<["-"], "fexperimental-loop-fuse">, Group<f_Group>, | |
HelpText<"Enable the experimental loop fuse pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>; | |
def fno_experimental_loop_fuse: Flag<["-"], "fno-experimental-loop-fuse">, Group<f_Group>, | |
HelpText<"Disable the experimental loop fuse pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>; | |
def fexperimental_loop_fuse : Flag<["-"], "fexperimental-loop-fusion">, Group<f_Group>, | |
HelpText<"Enable the experimental loop fusion pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>; | |
def fno_experimental_loop_fuse: Flag<["-"], "fno-experimental-loop-fusion">, Group<f_Group>, | |
HelpText<"Disable the experimental loop fusion pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>; |
clang/lib/CodeGen/BackendUtil.cpp
Outdated
@@ -897,6 +897,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline( | |||
PipelineTuningOptions PTO; | |||
PTO.LoopUnrolling = CodeGenOpts.UnrollLoops; | |||
PTO.LoopInterchange = CodeGenOpts.InterchangeLoops; | |||
PTO.LoopFuse = CodeGenOpts.FuseLoops; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT
PTO.LoopFuse = CodeGenOpts.FuseLoops; | |
PTO.LoopFusion = CodeGenOpts.FuseLoops; |
clang/lib/CodeGen/BackendUtil.cpp
Outdated
@@ -1332,6 +1333,7 @@ runThinLTOBackend(CompilerInstance &CI, ModuleSummaryIndex *CombinedIndex, | |||
Conf.SampleProfile = std::move(SampleProfile); | |||
Conf.PTO.LoopUnrolling = CGOpts.UnrollLoops; | |||
Conf.PTO.LoopInterchange = CGOpts.InterchangeLoops; | |||
Conf.PTO.LoopFuse = CGOpts.FuseLoops; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT
Conf.PTO.LoopFuse = CGOpts.FuseLoops; | |
Conf.PTO.LoopFusion = CGOpts.FuseLoops; |
Args.AddLastArg(CmdArgs, options::OPT_fexperimental_loop_fuse, | ||
options::OPT_fno_experimental_loop_fuse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT
Args.AddLastArg(CmdArgs, options::OPT_fexperimental_loop_fuse, | |
options::OPT_fno_experimental_loop_fuse); | |
Args.AddLastArg(CmdArgs, options::OPT_fexperimental_loop_fusion, | |
options::OPT_fno_experimental_loop_fusion); |
@@ -151,6 +151,8 @@ void Flang::addCodegenOptions(const ArgList &Args, | |||
!stackArrays->getOption().matches(options::OPT_fno_stack_arrays)) | |||
CmdArgs.push_back("-fstack-arrays"); | |||
|
|||
Args.AddLastArg(CmdArgs, options::OPT_fexperimental_loop_fuse, options::OPT_fno_experimental_loop_fuse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not nit: Probably need to run clang-format
on this line.
NIT
Args.AddLastArg(CmdArgs, options::OPT_fexperimental_loop_fuse, options::OPT_fno_experimental_loop_fuse); | |
Args.AddLastArg(CmdArgs, options::OPT_fexperimental_loop_fusion, options::OPT_fno_experimental_loop_fusion); |
@@ -65,6 +65,9 @@ class PipelineTuningOptions { | |||
/// false. | |||
bool LoopInterchange; | |||
|
|||
/// Tuning option to enable/disable loop fuse. Its default value is false. | |||
bool LoopFuse; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
bool LoopFuse; | |
bool LoopFusion; |
@@ -204,6 +205,10 @@ static cl::opt<bool> | |||
EnableLoopInterchange("enable-loopinterchange", cl::init(false), cl::Hidden, | |||
cl::desc("Enable the LoopInterchange Pass")); | |||
|
|||
static cl::opt<bool> EnableLoopFuse("enable-loopfuse", cl::init(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
static cl::opt<bool> EnableLoopFuse("enable-loopfuse", cl::init(false), | |
static cl::opt<bool> EnableLoopFuse("enable-loopfusion", cl::init(false), |
@@ -43,6 +43,7 @@ CODEGENOPT(StackArrays, 1, 0) ///< -fstack-arrays (enable the stack-arrays pass) | |||
CODEGENOPT(VectorizeLoop, 1, 0) ///< Enable loop vectorization. | |||
CODEGENOPT(VectorizeSLP, 1, 0) ///< Enable SLP vectorization. | |||
CODEGENOPT(InterchangeLoops, 1, 0) ///< Enable loop interchange. | |||
CODEGENOPT(FuseLoops, 1, 0) ///< Enable loop fuse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
CODEGENOPT(FuseLoops, 1, 0) ///< Enable loop fuse. | |
CODEGENOPT(FuseLoops, 1, 0) ///< Enable loop fusion. |
flang/docs/ReleaseNotes.md
Outdated
@@ -32,6 +32,9 @@ page](https://llvm.org/releases/). | |||
|
|||
## New Compiler Flags | |||
|
|||
* -floop-interchange is now recognized by flang. | |||
* -fexperimental-loop-fuse is now recognized by flang. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* -fexperimental-loop-fuse is now recognized by flang. | |
* -fexperimental-loop-fusion is now recognized by flang. |
@@ -322,6 +322,7 @@ CODEGENOPT(TimeTrace , 1, 0, Benign) ///< Set when -ftime-trace is enabl | |||
VALUE_CODEGENOPT(TimeTraceGranularity, 32, 500, Benign) ///< Minimum time granularity (in microseconds), | |||
///< traced by time profiler | |||
CODEGENOPT(InterchangeLoops , 1, 0, Benign) ///< Run loop-interchange. | |||
CODEGENOPT(FuseLoops , 1, 0, Benign) ///< Run loop-fuse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CODEGENOPT(FuseLoops , 1, 0, Benign) ///< Run loop-fuse. | |
CODEGENOPT(FuseLoops , 1, 0, Benign) ///< Run loop-fusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, using merge instead of rebase might be a better. That way, you might avoid resolving the same conflicts repeatedly.
flang/docs/ReleaseNotes.md
Outdated
@@ -32,6 +32,9 @@ page](https://llvm.org/releases/). | |||
|
|||
## New Compiler Flags | |||
|
|||
* -floop-interchange is now recognized by flang. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a mistake while resolving the conflict.
if (PTO.LoopFuse) | ||
FPM.addPass(LoopFusePass()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoopFusion is a function pass
I don't think this implies that the pass should be part of the function simplification pipeline. Its position should be determined by what the pass actually does. The appropriate placement likely depends on what you're trying to achieve with it.
cc3e985
to
5601486
Compare
@tarunprabhu Sure, I can take that suggestion. I have done the changes. Can you please resolve the conservation? @kasuga-fj I fixed the rebase issue in
What do you think is the right place in pass pipeline? |
I don't know, but just a gut feeling, somewhere in FWIW: recently I moved the loop interchange pass from the simplification pipeline into the optimization pipeline in #145502. I did so more due to the drawbacks of having it in the simplification pipeline than any particular advantages of placing it in the optimization pipeline. I also confirmed that the number of interchanged loops barely changed after the move. However, I found that it triggered potential phase ordering issues after merging it (in my case, some other passes like GVN/LICM can transform the IR into a form that makes interchange difficult). I believe this should be handled inside LoopInterchange itself, but it's worth noting that (I think) it's not an uncommon situation. (That said, if you don’t have any specific motivating examples at the moment, I suppose any placement may be fine). |
@@ -204,6 +205,10 @@ static cl::opt<bool> | |||
EnableLoopInterchange("enable-loopinterchange", cl::init(false), cl::Hidden, | |||
cl::desc("Enable the LoopInterchange Pass")); | |||
|
|||
static cl::opt<bool> EnableLoopFusion("enable-loopfusion", cl::init(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't quite understand why you want to add both a clang flag and an opt flag. But if that's the intention, EnableLoopFusino
should be defined in tools/opt/NewPMDriver.cpp
, not here. The same applies to EnableLoopInterchange
. IIUC, declaring this here allows specifying like clang -mllvm -enable-loopfusion
, but it will be overwritten after PipelineTuningOptions
's ctor is invoked.
On the other hand, options defined in tools/opt/NewPMDriver.cpp
aren't recognized by the clang driver, IIUC.
But again, I'm not sure why clang/flang flags are really needed. Why isn't the opt flag sufficient? IMHO, at the very least, you should clarify the reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said, I am taking over this patch from Sebastian. Your comment makes me think.
The primary purpose of this patch is to user-facing option to allow turning fusion on or off. Not everyone is aware of the options provided by developer tools like opt.
I see the precedence in the code. In flang/include/flang/Frontend/CodegenOptions.def
I see options for loop-versioning, unrolling etc.
I see clang and flang differ here. clang doesn't seem to have such options except for floop-interchange
.
floop-interchange
was introduced in #125830
Having fexperimental-loop-fusion
is more of syntactic sugar and convenience is based on the same principles of #125830.
I can remove the change from PassBuilderPipelines.cpp
i.e. EnableFusion, as it can be redundant.
@@ -204,6 +205,10 @@ static cl::opt<bool> | |||
EnableLoopInterchange("enable-loopinterchange", cl::init(false), cl::Hidden, | |||
cl::desc("Enable the LoopInterchange Pass")); | |||
|
|||
static cl::opt<bool> EnableLoopFusion("enable-loopfusion", cl::init(false), | |||
cl::Hidden, | |||
cl::desc("Enable the LoopFusion Pass")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
cl::desc("Enable the LoopFusion Pass")); | |
cl::desc("Enable the LoopFuse Pass")); |
The pass name is LoopFuse
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Taken.
Yeah, I don't have enough data to support the right place in the pipeline. I will add |
This patch adds the flag -fexperimental-loop-fuse to the clang and flang drivers. This is primarily useful for experiments as we envision to enable the pass one day.
The options are based on the same principles and reason on which we have
floop-interchange
.