-
Notifications
You must be signed in to change notification settings - Fork 334
nvfp4 tensor: refactor weight-only vs dynamic quant #2790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Stack from ghstack (oldest at bottom): |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2790
Note: Links to docs will display an error until the docs builds have been completed. ⏳ 4 Pending, 1 Unrelated FailureAs of commit ff0ee90 with merge base c120bb7 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Refactors `NVFP4Tensor` to use `act_quant_kwargs`, to follow the design of recently added `Float8Tensor`. Note that chose not to use `_choose_quant_func_and_quantize_tensor` as we do not support any activation types other than nvfp4. This can be relaxed in the future if needed. This is still not the final API, might need to make more tweaks before we bring out of prototype. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: f2496ce ghstack-comment-id: 3197771544 Pull-Request: #2790
Summary: Refactors `NVFP4Tensor` to use `act_quant_kwargs`, to follow the design of recently added `Float8Tensor`. Note that chose not to use `_choose_quant_func_and_quantize_tensor` as we do not support any activation types other than nvfp4. This can be relaxed in the future if needed. This is still not the final API, might need to make more tweaks before we bring out of prototype. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 02810d1 ghstack-comment-id: 3197771544 Pull-Request: #2790
@@ -141,9 +155,11 @@ def to_nvfp4( | |||
block_size: Block size for quantization (must be 16) | |||
per_tensor_scale: Optional pre-computed absolute maximum for calibration. | |||
If provided, uses per-tensor scaling. If None, uses block-wise scaling only. | |||
mm_config: Matrix multiplication configuration | |||
per_tensor_scale: Optional pre-computed absolute maximum for calibration for activation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this supposed to be act_per_tensor_scale
? also is it expected that docstring is the same as the previous item
Summary: Refactors `NVFP4Tensor` to use `act_quant_kwargs`, to follow the design of recently added `Float8Tensor`. Note that chose not to use `_choose_quant_func_and_quantize_tensor` as we do not support any activation types other than nvfp4. This can be relaxed in the future if needed. This is still not the final API, might need to make more tweaks before we bring out of prototype. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 61cbdf1 ghstack-comment-id: 3197771544 Pull-Request: #2790
Summary: Refactors `NVFP4Tensor` to use `act_quant_kwargs`, to follow the design of recently added `Float8Tensor`. Note that chose not to use `_choose_quant_func_and_quantize_tensor` as we do not support any activation types other than nvfp4. This can be relaxed in the future if needed. This is still not the final API, might need to make more tweaks before we bring out of prototype. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d56b62a ghstack-comment-id: 3197771544 Pull-Request: #2790
Summary: Refactors `NVFP4Tensor` to use `act_quant_kwargs`, to follow the design of recently added `Float8Tensor`. Note that chose not to use `_choose_quant_func_and_quantize_tensor` as we do not support any activation types other than nvfp4. This can be relaxed in the future if needed. This is still not the final API, might need to make more tweaks before we bring out of prototype. Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: d56b62a ghstack-comment-id: 3197771544 Pull-Request: #2790
* Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned]
Summary:
Refactors
NVFP4Tensor
to useact_quant_kwargs
, to follow the designof recently added
Float8Tensor
.Note that chose not to use
_choose_quant_func_and_quantize_tensor
aswe do not support any activation types other than nvfp4. This can be
relaxed in the future if needed.
This is still not the final API, might need to make more tweaks before
we bring out of prototype.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: