Skip to content

Conversation

@GandalfTea
Copy link
Contributor

Summary

Log a warning when a quantization config is found in apply_quantization_from_config but nn.quantize fails.

Changes

  • modified the apply_quantization_from_config to return Tuple[boo, bool], the first is true if g_bits and g_group are non 0 and the second is the result from nn.quantize
  • modified ShardRuntime to issue log when bits/group are present but nn.quantize failed.

Testing

Tested with a failing version of qwen3_moe model implementation file.

@GandalfTea GandalfTea marked this pull request as draft November 27, 2025 09:47
@GandalfTea GandalfTea marked this pull request as ready for review November 27, 2025 10:04
@andthattoo andthattoo self-requested a review November 28, 2025 08:21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to raiseException since we don't/can't run a model once it's not correctly quantized?

Copy link
Member

@andthattoo andthattoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to raiseException since we don't/can't run a model once it's not correctly quantized?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants