Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Seq] Add a new pass to sink clock_gate ops to the users #7474

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

prithayan
Copy link
Contributor

This is a new pass that sinks seq.clock_gate into its users and attempts to remove as many seq.clock_gate ops as possible.

Any operation that can potentially encode the enable directly, will replace its gated clock with the un-gated clock and encode the clock gate enable as the operation's enable signal.
Currently only the CompRegOp operation is handled, but similar transformation needs to be attempted on other supported operations also.

The first phase of the pass is a global transformation that continues to push the clock gates into the referenced modules, until none of the instance ops are driven by a gated clock.
Next, best effort is made to remove the gated clock and encode the enable directly into the operation that is being driven by the clock.

Copy link
Contributor

@fabianschuiki fabianschuiki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! This is a very useful transform 😍. Does the code also work if a module is instantiated multiple times? And if one of the instances has a clock gate in front and the other does not?

Comment on lines 100 to 180
/// Combine two consecutive clock_gate ops, into one.
void collapseConsecutiveClockGates(ClockGateOp driverGate,
ClockGateOp userGate) {
mlir::OpBuilder builder(context);
builder.setInsertionPoint(userGate);
Value enable = getEnable(driverGate);
enable = builder.create<comb::AndOp>(userGate->getLoc(), enable,
userGate.getEnable());
userGate.getInputMutable().assign(driverGate.getInput());
userGate.getEnableMutable().assign(enable);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very neat! Might also make sense as a canonicalization on ClockGateOp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will add another PR to add this pattern.

Comment on lines 112 to 195
/// If the clock_gate op has `test_enable`, combine it with the `enable` and
/// return the final enable.
Value getEnable(ClockGateOp clkGate) {
Value enable = clkGate.getEnable();
// If there is testEnable, or it.
if (auto testEnable = clkGate.getTestEnable()) {
OpBuilder builder(clkGate.getContext());
builder.setInsertionPointAfter(clkGate);
enable =
builder.create<comb::OrOp>(clkGate->getLoc(), enable, testEnable);
}
return enable;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we'd need to keep the regular enable and the test_enable separate… probably not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked David, and seems like its okay to combine them. Not sure of other use cases, if that's required then we need two additional ports for every gated clock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, good point, forgot about that. I was just wondering of how a user might expect the pass to behave in CIRCT -- someone could expect that the enable inputs would be preserved as separate signals. But on the other hand this is also very easy to adjust once the pattern of the pass is established, and this comes up as a concrete need. Combining them feels like a good start!

lib/Dialect/Seq/Transforms/SinkClockGates.cpp Outdated Show resolved Hide resolved
Comment on lines +191 to +275
auto clkGate = dyn_cast_or_null<ClockGateOp>(in.getDefiningOp());
if (!clkGate)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bit fragile since it could break analysis when there is wire between instance and clock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you pointed out, this would require a fixed-point analysis to handle all the possible dataflow operations that can propagate clocks. I think such operations of passing a clock by a wire are very rare, and it might be fine to ignore them for this transformation. This transformation does not guarantee that all clock gates will be removed. I want to do some more experiments with this transformation enabled, and then based on the need, we can utilize a fixed-point analysis.

@uenoku
Copy link
Member

uenoku commented Aug 14, 2024

And if one of the instances has a clock gate in front and the other does not?

Yeah I think this transformation requires fixed-point analysis whose lattice is a pair of clock and enable.

@prithayan
Copy link
Contributor Author

Thanks a lot @fabianschuiki and @uenoku for the feedback.
The major issue in the previous iteration was multiple instances.
I have handled that in this version.
The other concern regarding the precision of this analysis, this is intended as a fast/simple and best-effort analysis, so its okay to miss a few clock_gates. We donot intend to eliminate all the clock gates.
Additionally clock_gates to some public-modules/extern-modules cannot be removed.
As Hideto suggested, I am also working on a fixed-point analysis that we might utilize if the need for it arises.

Please take another look at it.

Copy link
Contributor

@dtzSiFive dtzSiFive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! This is neat! 😁 🎉

Left some minor feedback.

include/circt/Dialect/HW/HWStructure.td Outdated Show resolved Hide resolved
include/circt/Dialect/Seq/SeqPasses.td Show resolved Hide resolved
lib/Dialect/Seq/Transforms/SinkClockGates.cpp Outdated Show resolved Hide resolved
lib/Dialect/Seq/Transforms/SinkClockGates.cpp Outdated Show resolved Hide resolved
lib/Dialect/Seq/Transforms/SinkClockGates.cpp Outdated Show resolved Hide resolved
@prithayan prithayan marked this pull request as ready for review August 15, 2024 21:41
Comment on lines 235 to 237
void appendPorts(
ArrayRef<std::tuple<StringAttr, Type, Location>> insertInputs,
ArrayRef<std::pair<StringAttr, Value>> insertOutputs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's difference from modifyPorts which we are going to deprecate?

Copy link
Contributor Author

@prithayan prithayan Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks for bringing it up, I will add comments.
Few reasons,

  1. Stop using the deprecated methods.
  2. Implement a simpler API, that only appends the input and then output ports, the most-common/only use case. Avoids the generic methods, that can insert/delete at random port indices.
  3. Avoid any performance foot-guns, the deprecated methods can have performance issues, implementing the fresh API that we discuss and properly inspect to ensure there are no obvious performance issues.
  4. At least rhodium can stop using the deprecated API, and switch to this, if its approved.
    I will commit this separately, so if it makes sense to discuss this more broadly, I will create a new PR with this change.
    What do you think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delayed replay. TBH I don't find much difference between deprecated functions. The intention for the deprecation was "it doesn't make sense to have an API which takes input and output ports separately" (

/// This is marked deprecated as it's only used from HandshakeToHW and
/// PortConverter and is likely broken and not currently tested. Users of this
/// are still written dealing with input and output ports separately, which is
/// an old and broken style.
[[deprecated]] static void
) so I'm not sure it's a right direction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestions Hideto, I updated the interface, we need the output Values to update the OutputOp also added the input values, that can be set as the block arguments are added. This avoids the need to know the argument index of the input args.

Comment on lines +148 to +167
SmallVector<StringAttr> getInstanceArgNames(HWInstanceLike inst) {
SmallVector<StringAttr> argNames;
if (auto args = inst->getAttrOfType<ArrayAttr>("argNames")) {
auto attr = args.getAsRange<StringAttr>();
argNames.insert(argNames.begin(), attr.begin(), attr.end());
}
return argNames;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add argNames getter to HWInstanceLike. I guess all operations that implement HWInstanceLike have non-optional argNames

Comment on lines +324 to +352
if (auto compReg = dyn_cast<CompRegOp>(gatedOp)) {
Value clk = compReg.getClk();
auto gate = dyn_cast_or_null<ClockGateOp>(clk.getDefiningOp());
if (!gate)
continue;
clk = gate.getInput();
Value enable = getEnable(gate);
OpBuilder builder(context);
builder.setInsertionPoint(compReg);
auto compEn = builder.create<CompRegClockEnabledOp>(
compReg->getLoc(), compReg.getInput(), clk, enable,
compReg.getNameAttr(), compReg.getReset(), compReg.getResetValue(),
compReg.getPowerOnValue(), compReg.getInnerSymAttr());
compReg->replaceAllUsesWith(compEn);
opsToErase.insert(compReg);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should go to canonicalization if it's always beneficial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought about that, but as you pointed out, I don't think this is "beneficial" in all cases. This should only happen if the user explicitly enables this transformation. So, the transformation of "sink clock-gates" implies trying to combine the clock gate's enable into the operation that is driven by the gated clock. This might not be desirable in general, and if we add it to the canonicalization we can't disable it in general.
But maybe this can be added to some op interface. A new op interface that any op can opt-in, and then we can invoke it from here.
My plan is to add the enable to FirRegOp next, similar transformation needs to happen to memory ops also, but its semantics are slightly non-trivial.

Copy link
Member

@uenoku uenoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about decoupling canonicalization from this pass? I feel it looks more cleaner if SinkClockGates only sinks clock gates and not perform comp reg simplification.

auto oldEnablePort = clockEnablePortIndices[index];
auto oldEnableVal = inst->getOperand(oldEnablePort);
auto *def = oldEnableVal.getDefiningOp();
assert(llvm::isa_and_nonnull<ConstantOp>(def));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it guaranteed to be constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a constant, because this is another instance of a module that is already visited. When the first instance was visited, we added the enable port to all the module instances. The enable port for all the other instances were set to a constant true, except the one that is being visited, which is set to the clock gate op's enable. This ensures that we donot add multiple enable ports for the same clock gate port when visiting multiple instances of a module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants