[LLVMGPUVectorDistribute] Support vector.mask + vector.multi_reduce #19880

manupak · 2025-02-03T14:07:32Z

This commit enables vector layout propogation
into and out of vector.mask and its body.

Moreover, it enables the distribution of vector.multi_reduce
that is wrapped in a vector.mask.
The way that is done is :

The distributed mask is applied to thread-local reduce
The distributed operand is selected between the
reduction identity and the provided operand using
the distributed mask.

depends on : #19830 (hence putting to draft until thats merged)

manupak · 2025-02-03T14:11:02Z

@Groverkss is it fair to assume region nesting will be honored when distributing?
(I know this how ops are collected for distribution for now -- but want to confirm whether its coincidence or by design)
i.e. innermost nesting will be distributed prior to outer ones

manupak · 2025-02-03T16:29:04Z

@Groverkss is it fair to assume region nesting will be honored when distributing? (I know this how ops are collected for distribution for now -- but want to confirm whether its coincidence or by design) i.e. innermost nesting will be distributed prior to outer ones

Alright, I introduced MaskedOpDistributionPattern which rewrites vector.mask op wrapper away post-distribution.

This commit enables vector layout propogation into and out of vector.mask and its body. Moreover, it enables the distribution of vector.multi_reduce that is wrapped in a vector.mask. The way that is done is : * The distributed mask is applied to thread-local reduce * The distributed opernad is selected between the reduction identity and the provided operand using the distributed mask. Signed-off-by: Manupa Karunaratne <[email protected]>

a hook to provide vector.mask { op } rewrites. This removes the rewrite ordering constraint that would otherwise be there where body op has to be distributed prior to mask op. Now, using this hook, developers could write masked op distribution pattern where pre-distribution mask op would be removed as part of the rewrite. Signed-off-by: Manupa Karunaratne <[email protected]>

manupak · 2025-02-20T12:16:12Z

PTAL @qedawkins if you have sometime...

qedawkins

One main question about why we need both the local mask and the select, otherwise LGTM

qedawkins · 2025-02-20T15:17:41Z

compiler/src/iree/compiler/Codegen/Common/VectorLayoutAnalysis.cpp

+    std::function<void(DistributionLayout *, mlir::ChangeResult)> update) {
+  mask.getBody()->walk(
+      [&](Operation *traversed) { visitOperation(traversed); });
+  // Propogate from body to results


Suggested change

// Propogate from body to results

// Propagate from body to results.

qedawkins · 2025-02-21T14:44:43Z

compiler/src/iree/compiler/Codegen/Common/GPU/GPUNestedLayoutDistributionPatterns.cpp

+      }
+      mask = getDistributed(rewriter, maskOp.getMask(), maskLayout);
+      Value passThruSrc = getCombiningIdentityValue(
+          loc, rewriter, multiReduceOp.getKind(), disSrc.getType());


vector.mask can carry its own pass_thru, which I'm guessing goes here.

skipped as discussed down below

qedawkins · 2025-02-21T14:55:16Z

compiler/src/iree/compiler/Codegen/Common/GPU/GPUNestedLayoutDistributionPatterns.cpp

        loc, disSrc, localInit, distributedReductionMask,
        multiReduceOp.getKind());
+    if (mask) {
+      localReduction =
+          vector::maskOperation(rewriter, localReduction.getDefiningOp(), mask)


Why do we need the arith.select and the vector.mask?

removed post-distribution masking for now as discussed.

qedawkins · 2025-02-21T14:59:10Z

...er/src/iree/compiler/Codegen/Common/GPU/test/gpu_nested_layout_vector_distribution_mask.mlir

+// CHECK: %[[MASK_ITL_PCK:.+]] = vector.transpose %[[MASK_PCK]], [0, 3, 1, 4, 2, 5] : vector<2x2x2x1x1x8xi1> to vector<2x1x2x1x2x8xi1>
+
+// CHECK: %[[SELECT:.+]] = arith.select %[[MASK_ITL_PCK]], {{.*}}, %[[RED_IDENTITY]] : vector<2x1x2x1x2x8xi1>, vector<2x1x2x1x2x8xf16>
+// CHECK: vector.mask %[[MASK_ITL_PCK]] { vector.multi_reduction <add>, %[[SELECT]], {{.*}} [0, 2, 4] : vector<2x1x2x1x2x8xf16> to vector<1x1x8xf16> } : vector<2x1x2x1x2x8xi1> -> vector<1x1x8xf16>


Can you add a test with pass_thru on the vector.mask?

skipped as discussed down below

manupak · 2025-02-21T15:07:57Z

One main question about why we need both the local mask and the select, otherwise LGTM

So the masking is for thread-local reductions.
The distribution can and will happen across reduction dimensions. Therefore, in the corner case, where no reductions to happen thread-locally, I thought it needed to select reduction identity.

Re-thinking, maybe the init might cover that already -- I can give a go at removing the select.

manupak · 2025-02-21T15:10:49Z

wait .. no if we want to support passthru then select is needed.
So I ll add that with a test then ?

qedawkins · 2025-02-21T15:12:30Z

yeah I think we need to keep the select (although also fine to just not support pass_thru right now) and we can try dropping the mask.

manupak · 2025-02-21T15:21:02Z

I can add passthru but why drop the mask ?
it doesn't hurt to retain the mask post-distribution. no?
(well except I need this : llvm/llvm-project#126722 to be integrated into IREE)

qedawkins · 2025-02-21T15:53:37Z

Isn't the mask redundant if we have the select?

manupak · 2025-02-21T16:02:52Z

for e.g., If the thread-local reduction dimension is long (longer than a machine vector), would n't that be used to cut down instructions issued?
(though I dont know whether thats how its lowered -- so happy to skip masking the distributed op -- if you think cons outweigh the pros)

qedawkins · 2025-02-21T16:15:23Z

If the thread-local reduction dimension is long (longer than a machine vector), would n't that be used to cut down instructions issued?

It didn't look like the existing lowerings were doing that to me, but I might not have looked close enough. If it does work out like that, keeping the mask makes sense. I've mostly been asking because the mask was surprising to me, I can approve and leave it as a future exercise to determine whether it's useful.

manupak · 2025-02-21T16:20:24Z

I spent time reading the upstream code as well and traces now.
as per current upstream implementations, it seems it does a select at much finer granularity.
At the sametime, I didn;t even see a single mention of the "passthru of the mask op" in the upstream lowering; so it might not likely is implemented as well.

Thus I ll leave a comment here and remove post-distribution mask; just not to trip on anything.
(Sorry for the carrying on my overthinking trip here... :) )

qedawkins · 2025-02-21T16:24:27Z

ah ok, well ignore the pass_thru then. Sounds good to me! We can always add it back later if it's better.

Signed-off-by: Manupa Karunaratne <[email protected]>

manupak requested a review from Groverkss February 3, 2025 14:07

manupak requested review from antiagainst, qedawkins, hanhanW and MaheshRavishankar as code owners February 3, 2025 14:07

manupak marked this pull request as draft February 3, 2025 14:08

manupak force-pushed the distribute-masked-reductions branch from 08e4d7b to 2ed8a16 Compare February 4, 2025 11:41

manupak mentioned this pull request Feb 4, 2025

[LLVMGPUVectorDistribute] Support vector.mask + vector.contract #19899

Draft

manupak force-pushed the distribute-masked-reductions branch from 2ed8a16 to 899abf1 Compare February 14, 2025 15:36

manupak mentioned this pull request Feb 14, 2025

[LLVMGPUVectorDistribute] Add general support for statically tiled codegen on dynamic shapes #19992

Draft

manupak force-pushed the distribute-masked-reductions branch from 899abf1 to a8dcc8b Compare February 18, 2025 17:43

manupak added 2 commits February 20, 2025 04:03

manupak force-pushed the distribute-masked-reductions branch from a8dcc8b to 920fb53 Compare February 20, 2025 12:12

manupak marked this pull request as ready for review February 20, 2025 12:15

qedawkins reviewed Feb 21, 2025

View reviewed changes

manupak requested a review from qedawkins February 21, 2025 16:34

Remove post-distrubtion masking for now.

ef2d520

Signed-off-by: Manupa Karunaratne <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVMGPUVectorDistribute] Support vector.mask + vector.multi_reduce #19880

[LLVMGPUVectorDistribute] Support vector.mask + vector.multi_reduce #19880

manupak commented Feb 3, 2025 •

edited

Loading

manupak commented Feb 3, 2025

manupak commented Feb 3, 2025

manupak commented Feb 20, 2025

qedawkins left a comment

qedawkins Feb 20, 2025

manupak Feb 21, 2025

qedawkins Feb 21, 2025

manupak Feb 21, 2025

qedawkins Feb 21, 2025

manupak Feb 21, 2025

qedawkins Feb 21, 2025

manupak Feb 21, 2025

manupak commented Feb 21, 2025

manupak commented Feb 21, 2025

qedawkins commented Feb 21, 2025

manupak commented Feb 21, 2025

qedawkins commented Feb 21, 2025

manupak commented Feb 21, 2025 •

edited

Loading

qedawkins commented Feb 21, 2025

manupak commented Feb 21, 2025 •

edited

Loading

qedawkins commented Feb 21, 2025

	// Propogate from body to results
	// Propagate from body to results.

[LLVMGPUVectorDistribute] Support vector.mask + vector.multi_reduce #19880

Are you sure you want to change the base?

[LLVMGPUVectorDistribute] Support vector.mask + vector.multi_reduce #19880

Conversation

manupak commented Feb 3, 2025 • edited Loading

manupak commented Feb 3, 2025

manupak commented Feb 3, 2025

manupak commented Feb 20, 2025

qedawkins left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manupak commented Feb 21, 2025

manupak commented Feb 21, 2025

qedawkins commented Feb 21, 2025

manupak commented Feb 21, 2025

qedawkins commented Feb 21, 2025

manupak commented Feb 21, 2025 • edited Loading

qedawkins commented Feb 21, 2025

manupak commented Feb 21, 2025 • edited Loading

qedawkins commented Feb 21, 2025

manupak commented Feb 3, 2025 •

edited

Loading

manupak commented Feb 21, 2025 •

edited

Loading

manupak commented Feb 21, 2025 •

edited

Loading