Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ET-VK][ez] Make squeeze insertion requirements more strict #9917

Open
wants to merge 4 commits into
base: gh/SS-JIA/207/base
Choose a base branch
from

Conversation

SS-JIA
Copy link
Contributor

@SS-JIA SS-JIA commented Apr 4, 2025

Stack from ghstack (oldest at bottom):

Context

Refactor the SqueezeUnsqueezeInputs pass to be more clear about its intention.

For Llama models, input shapes to 4 bit linear will oftentimes have the shape [1, seq_len, dim]; under the current implementation of the pass, the input would be squeezed to [seq_len, dim] even though the squeeze is not necessary.

The original intention of thispass was to squeeze inputs with shape [batch_size, 1, dim] to [batch_size, dim] before calling the 4-bit linear operator.

Changes

To avoid inserting unnecessary squeeze/unsqueezes, be more specific about when squeeze/unsqueeze should be added.

I would like to consider refactoring this pass in the future, since the logic is currently a bit uninttuitive. Squeeze/unsqueeze is also inserted for gelu and relu, but this is to create a chain of unsqueeze/squeeze that will be eliminated by a later pass (see #8601 / D69673068). I think eventually it will be good to rewrite the pass to make shape management more explicit and self contained within the pass rather than inserting ops which are expected to be removed later on.

Differential Revision: D72480178

## Context

Refactor the `SqueezeUnsqueezeInputs` pass to be more clear about its intention.

For Llama models, input shapes to 4 bit linear will oftentimes have the shape `[1, seq_len, dim]`; under the current implementation of the pass, the input would be squeezed to `[seq_len, dim]` even though the squeeze is not necessary.

The original intention of thispass was to squeeze inputs with shape `[batch_size, 1, dim]` to `[batch_size, dim]` before calling the 4-bit linear operator.

## Changes

To avoid inserting unnecessary squeeze/unsqueezes, be more specific about when squeeze/unsqueeze should be added.

I would like to consider refactoring this pass in the future, since the logic is currently a bit uninttuitive. Squeeze/unsqueeze is also inserted for gelu and relu, but this is to create a chain of unsqueeze/squeeze that will be eliminated by a later pass (see #8601 / D69673068). I think eventually it will be good to rewrite the pass to make shape management more explicit and self contained within the pass rather than inserting ops which are expected to be removed later on.

Differential Revision: [D72480178](https://our.internmc.facebook.com/intern/diff/D72480178/)

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Apr 4, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9917

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ae87928 with merge base 6adff9c (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 4, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72480178

## Context

Refactor the `SqueezeUnsqueezeInputs` pass to be more clear about its intention.

For Llama models, input shapes to 4 bit linear will oftentimes have the shape `[1, seq_len, dim]`; under the current implementation of the pass, the input would be squeezed to `[seq_len, dim]` even though the squeeze is not necessary.

The original intention of thispass was to squeeze inputs with shape `[batch_size, 1, dim]` to `[batch_size, dim]` before calling the 4-bit linear operator.

## Changes

To avoid inserting unnecessary squeeze/unsqueezes, be more specific about when squeeze/unsqueeze should be added.

I would like to consider refactoring this pass in the future, since the logic is currently a bit uninttuitive. Squeeze/unsqueeze is also inserted for gelu and relu, but this is to create a chain of unsqueeze/squeeze that will be eliminated by a later pass (see #8601 / D69673068). I think eventually it will be good to rewrite the pass to make shape management more explicit and self contained within the pass rather than inserting ops which are expected to be removed later on.

Differential Revision: [D72480178](https://our.internmc.facebook.com/intern/diff/D72480178/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72480178

@SS-JIA SS-JIA added the release notes: vulkan Changes to the Vulkan backend delegate label Apr 7, 2025
## Context

Refactor the `SqueezeUnsqueezeInputs` pass to be more clear about its intention.

For Llama models, input shapes to 4 bit linear will oftentimes have the shape `[1, seq_len, dim]`; under the current implementation of the pass, the input would be squeezed to `[seq_len, dim]` even though the squeeze is not necessary.

The original intention of thispass was to squeeze inputs with shape `[batch_size, 1, dim]` to `[batch_size, dim]` before calling the 4-bit linear operator.

## Changes

To avoid inserting unnecessary squeeze/unsqueezes, be more specific about when squeeze/unsqueeze should be added.

I would like to consider refactoring this pass in the future, since the logic is currently a bit uninttuitive. Squeeze/unsqueeze is also inserted for gelu and relu, but this is to create a chain of unsqueeze/squeeze that will be eliminated by a later pass (see #8601 / D69673068). I think eventually it will be good to rewrite the pass to make shape management more explicit and self contained within the pass rather than inserting ops which are expected to be removed later on.

Differential Revision: [D72480178](https://our.internmc.facebook.com/intern/diff/D72480178/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72480178

## Context

Refactor the `SqueezeUnsqueezeInputs` pass to be more clear about its intention.

For Llama models, input shapes to 4 bit linear will oftentimes have the shape `[1, seq_len, dim]`; under the current implementation of the pass, the input would be squeezed to `[seq_len, dim]` even though the squeeze is not necessary.

The original intention of thispass was to squeeze inputs with shape `[batch_size, 1, dim]` to `[batch_size, dim]` before calling the 4-bit linear operator.

## Changes

To avoid inserting unnecessary squeeze/unsqueezes, be more specific about when squeeze/unsqueeze should be added.

I would like to consider refactoring this pass in the future, since the logic is currently a bit uninttuitive. Squeeze/unsqueeze is also inserted for gelu and relu, but this is to create a chain of unsqueeze/squeeze that will be eliminated by a later pass (see #8601 / D69673068). I think eventually it will be good to rewrite the pass to make shape management more explicit and self contained within the pass rather than inserting ops which are expected to be removed later on.

Differential Revision: [D72480178](https://our.internmc.facebook.com/intern/diff/D72480178/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72480178

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported release notes: vulkan Changes to the Vulkan backend delegate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants