Skip to content

Conversation

mdvoretc-intel
Copy link

Details:

  • The change allows parameters to be recognized alongside constants as valid weight inputs for transformations producing FullyConnectedCompressed nodes

Description of the issue:

At present, the FC_COMPRESSED_WEIGHT_PATTERN macro contains a pattern for dequantization of a constant integer weight. This pattern is used to recognize and fold cases where fused weight dequantization can be used, replacing them with FullyConnectedCompressed nodes. Due to expecting a constant weight input, this pattern fails to recognize quantized LoRA weights, which are provided as parameters:
fc_compressed_param_before
With the changes in this patch, these weights can be recognized, and the transformations can proceed and produce nodes that would then leverage oneDNN fused QGEMM for execution:
fc_compressed_param_after

Tickets:

This change enables use of quantized LoRA weights, passed as parameters during
execution, to be recognized by the transformaions that produce
FullyConnectedCompressed nodes for QGEMM execution.
@github-actions github-actions bot added category: GPU OpenVINO GPU plugin category: transformations OpenVINO Runtime library - Transformations labels Oct 2, 2025
@sys-openvino-ci sys-openvino-ci added the ExternalIntelPR External contributor from Intel label Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: GPU OpenVINO GPU plugin category: transformations OpenVINO Runtime library - Transformations ExternalIntelPR External contributor from Intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants