[ET-VK][ez] Make squeeze insertion requirements more strict #9917

SS-JIA · 2025-04-04T20:49:33Z

Stack from ghstack (oldest at bottom):

Context

Refactor the SqueezeUnsqueezeInputs pass to be more clear about its intention.

For Llama models, input shapes to 4 bit linear will oftentimes have the shape [1, seq_len, dim]; under the current implementation of the pass, the input would be squeezed to [seq_len, dim] even though the squeeze is not necessary.

The original intention of thispass was to squeeze inputs with shape [batch_size, 1, dim] to [batch_size, dim] before calling the 4-bit linear operator.

Changes

To avoid inserting unnecessary squeeze/unsqueezes, be more specific about when squeeze/unsqueeze should be added.

I would like to consider refactoring this pass in the future, since the logic is currently a bit uninttuitive. Squeeze/unsqueeze is also inserted for gelu and relu, but this is to create a chain of unsqueeze/squeeze that will be eliminated by a later pass (see #8601 / D69673068). I think eventually it will be good to rewrite the pass to make shape management more explicit and self contained within the pass rather than inserting ops which are expected to be removed later on.

Differential Revision: D72480178

## Context Refactor the `SqueezeUnsqueezeInputs` pass to be more clear about its intention. For Llama models, input shapes to 4 bit linear will oftentimes have the shape `[1, seq_len, dim]`; under the current implementation of the pass, the input would be squeezed to `[seq_len, dim]` even though the squeeze is not necessary. The original intention of thispass was to squeeze inputs with shape `[batch_size, 1, dim]` to `[batch_size, dim]` before calling the 4-bit linear operator. ## Changes To avoid inserting unnecessary squeeze/unsqueezes, be more specific about when squeeze/unsqueeze should be added. I would like to consider refactoring this pass in the future, since the logic is currently a bit uninttuitive. Squeeze/unsqueeze is also inserted for gelu and relu, but this is to create a chain of unsqueeze/squeeze that will be eliminated by a later pass (see #8601 / D69673068). I think eventually it will be good to rewrite the pass to make shape management more explicit and self contained within the pass rather than inserting ops which are expected to be removed later on. Differential Revision: [D72480178](https://our.internmc.facebook.com/intern/diff/D72480178/) [ghstack-poisoned]

pytorch-bot · 2025-04-04T20:49:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9917

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ae87928 with merge base 6adff9c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-04-04T20:50:01Z

This pull request was exported from Phabricator. Differential Revision: D72480178

## Context Refactor the `SqueezeUnsqueezeInputs` pass to be more clear about its intention. For Llama models, input shapes to 4 bit linear will oftentimes have the shape `[1, seq_len, dim]`; under the current implementation of the pass, the input would be squeezed to `[seq_len, dim]` even though the squeeze is not necessary. The original intention of thispass was to squeeze inputs with shape `[batch_size, 1, dim]` to `[batch_size, dim]` before calling the 4-bit linear operator. ## Changes To avoid inserting unnecessary squeeze/unsqueezes, be more specific about when squeeze/unsqueeze should be added. I would like to consider refactoring this pass in the future, since the logic is currently a bit uninttuitive. Squeeze/unsqueeze is also inserted for gelu and relu, but this is to create a chain of unsqueeze/squeeze that will be eliminated by a later pass (see #8601 / D69673068). I think eventually it will be good to rewrite the pass to make shape management more explicit and self contained within the pass rather than inserting ops which are expected to be removed later on. Differential Revision: [D72480178](https://our.internmc.facebook.com/intern/diff/D72480178/) [ghstack-poisoned]

facebook-github-bot · 2025-04-04T21:47:02Z

This pull request was exported from Phabricator. Differential Revision: D72480178

## Context Refactor the `SqueezeUnsqueezeInputs` pass to be more clear about its intention. For Llama models, input shapes to 4 bit linear will oftentimes have the shape `[1, seq_len, dim]`; under the current implementation of the pass, the input would be squeezed to `[seq_len, dim]` even though the squeeze is not necessary. The original intention of thispass was to squeeze inputs with shape `[batch_size, 1, dim]` to `[batch_size, dim]` before calling the 4-bit linear operator. ## Changes To avoid inserting unnecessary squeeze/unsqueezes, be more specific about when squeeze/unsqueeze should be added. I would like to consider refactoring this pass in the future, since the logic is currently a bit uninttuitive. Squeeze/unsqueeze is also inserted for gelu and relu, but this is to create a chain of unsqueeze/squeeze that will be eliminated by a later pass (see #8601 / D69673068). I think eventually it will be good to rewrite the pass to make shape management more explicit and self contained within the pass rather than inserting ops which are expected to be removed later on. Differential Revision: [D72480178](https://our.internmc.facebook.com/intern/diff/D72480178/) [ghstack-poisoned]

facebook-github-bot · 2025-04-07T16:42:44Z

This pull request was exported from Phabricator. Differential Revision: D72480178

## Context Refactor the `SqueezeUnsqueezeInputs` pass to be more clear about its intention. For Llama models, input shapes to 4 bit linear will oftentimes have the shape `[1, seq_len, dim]`; under the current implementation of the pass, the input would be squeezed to `[seq_len, dim]` even though the squeeze is not necessary. The original intention of thispass was to squeeze inputs with shape `[batch_size, 1, dim]` to `[batch_size, dim]` before calling the 4-bit linear operator. ## Changes To avoid inserting unnecessary squeeze/unsqueezes, be more specific about when squeeze/unsqueeze should be added. I would like to consider refactoring this pass in the future, since the logic is currently a bit uninttuitive. Squeeze/unsqueeze is also inserted for gelu and relu, but this is to create a chain of unsqueeze/squeeze that will be eliminated by a later pass (see #8601 / D69673068). I think eventually it will be good to rewrite the pass to make shape management more explicit and self contained within the pass rather than inserting ops which are expected to be removed later on. Differential Revision: [D72480178](https://our.internmc.facebook.com/intern/diff/D72480178/) [ghstack-poisoned]

facebook-github-bot · 2025-04-07T17:42:50Z

This pull request was exported from Phabricator. Differential Revision: D72480178

SS-JIA mentioned this pull request Apr 4, 2025

[ET-VK] Improve packing format for int4 linear operator + misc improvements #9883

Open

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 4, 2025

SS-JIA mentioned this pull request Apr 4, 2025

[ET-VK][ez] Allow logit linear layer to be lowered to Vulkan #9918

Open

facebook-github-bot added the fb-exported label Apr 4, 2025

SS-JIA added the release notes: vulkan Changes to the Vulkan backend delegate label Apr 7, 2025

nathanaelsee approved these changes Apr 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK][ez] Make squeeze insertion requirements more strict #9917

[ET-VK][ez] Make squeeze insertion requirements more strict #9917

SS-JIA commented Apr 4, 2025 •

edited

Loading

pytorch-bot bot commented Apr 4, 2025 •

edited

Loading

facebook-github-bot commented Apr 4, 2025

facebook-github-bot commented Apr 4, 2025

facebook-github-bot commented Apr 7, 2025

facebook-github-bot commented Apr 7, 2025

[ET-VK][ez] Make squeeze insertion requirements more strict #9917

Are you sure you want to change the base?

[ET-VK][ez] Make squeeze insertion requirements more strict #9917

Conversation

SS-JIA commented Apr 4, 2025 • edited Loading

Context

Changes

pytorch-bot bot commented Apr 4, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9917

✅ No Failures

facebook-github-bot commented Apr 4, 2025

facebook-github-bot commented Apr 4, 2025

facebook-github-bot commented Apr 7, 2025

facebook-github-bot commented Apr 7, 2025

SS-JIA commented Apr 4, 2025 •

edited

Loading

pytorch-bot bot commented Apr 4, 2025 •

edited

Loading