Skip to content

Conversation

vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Aug 18, 2025

Summary:

Refactors NVFP4Tensor to use act_quant_kwargs, to follow the design
of recently added Float8Tensor.

Note that chose not to use _choose_quant_func_and_quantize_tensor as
we do not support any activation types other than nvfp4. This can be
relaxed in the future if needed.

This is still not the final API, might need to make more tweaks before
we bring out of prototype.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

vkuzo added 6 commits August 18, 2025 08:15
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@vkuzo
Copy link
Contributor Author

vkuzo commented Aug 18, 2025

Stack from ghstack (oldest at bottom):

Copy link

pytorch-bot bot commented Aug 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2790

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 4 Pending, 1 Unrelated Failure

As of commit ff0ee90 with merge base c120bb7 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo added a commit that referenced this pull request Aug 18, 2025
Summary:

Refactors `NVFP4Tensor` to use `act_quant_kwargs`, to follow the design
of recently added `Float8Tensor`.

Note that chose not to use `_choose_quant_func_and_quantize_tensor` as
we do not support any activation types other than nvfp4. This can be
relaxed in the future if needed.

This is still not the final API, might need to make more tweaks before
we bring out of prototype.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: f2496ce
ghstack-comment-id: 3197771544
Pull-Request: #2790
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 18, 2025
@vkuzo vkuzo added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Aug 18, 2025
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Aug 18, 2025
Summary:

Refactors `NVFP4Tensor` to use `act_quant_kwargs`, to follow the design
of recently added `Float8Tensor`.

Note that chose not to use `_choose_quant_func_and_quantize_tensor` as
we do not support any activation types other than nvfp4. This can be
relaxed in the future if needed.

This is still not the final API, might need to make more tweaks before
we bring out of prototype.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: 02810d1
ghstack-comment-id: 3197771544
Pull-Request: #2790
@vkuzo vkuzo requested review from jerryzh168 and drisspg August 18, 2025 17:23
@@ -141,9 +155,11 @@ def to_nvfp4(
block_size: Block size for quantization (must be 16)
per_tensor_scale: Optional pre-computed absolute maximum for calibration.
If provided, uses per-tensor scaling. If None, uses block-wise scaling only.
mm_config: Matrix multiplication configuration
per_tensor_scale: Optional pre-computed absolute maximum for calibration for activation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this supposed to be act_per_tensor_scale? also is it expected that docstring is the same as the previous item

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Aug 18, 2025
Summary:

Refactors `NVFP4Tensor` to use `act_quant_kwargs`, to follow the design
of recently added `Float8Tensor`.

Note that chose not to use `_choose_quant_func_and_quantize_tensor` as
we do not support any activation types other than nvfp4. This can be
relaxed in the future if needed.

This is still not the final API, might need to make more tweaks before
we bring out of prototype.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: 61cbdf1
ghstack-comment-id: 3197771544
Pull-Request: #2790
vkuzo added 2 commits August 18, 2025 15:53
[ghstack-poisoned]
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Aug 18, 2025
Summary:

Refactors `NVFP4Tensor` to use `act_quant_kwargs`, to follow the design
of recently added `Float8Tensor`.

Note that chose not to use `_choose_quant_func_and_quantize_tensor` as
we do not support any activation types other than nvfp4. This can be
relaxed in the future if needed.

This is still not the final API, might need to make more tweaks before
we bring out of prototype.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: d56b62a
ghstack-comment-id: 3197771544
Pull-Request: #2790
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Aug 18, 2025
Summary:

Refactors `NVFP4Tensor` to use `act_quant_kwargs`, to follow the design
of recently added `Float8Tensor`.

Note that chose not to use `_choose_quant_func_and_quantize_tensor` as
we do not support any activation types other than nvfp4. This can be
relaxed in the future if needed.

This is still not the final API, might need to make more tweaks before
we bring out of prototype.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: d56b62a
ghstack-comment-id: 3197771544
Pull-Request: #2790
@vkuzo vkuzo changed the base branch from gh/vkuzo/109/head to main August 18, 2025 22:53
@vkuzo vkuzo merged commit 5c0d6a3 into main Aug 18, 2025
37 of 48 checks passed
liangel-02 pushed a commit that referenced this pull request Aug 25, 2025
* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]

* Update

[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing Use this tag if you don't want this PR to show up in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants