Skip to content

Conversation

enlupi
Copy link
Contributor

@enlupi enlupi commented Aug 14, 2025

Description

This draft PR is intended to start a discussion on how to best integrate the PQuant🌶️ library with hls4ml.
The current state of the PR supports layers pruned and quantized using fixed quantizers. Support for high granularity quantization is still in development.

⚠️⚠️ In order for the PQuant layers to be parsed correctly, the code expects the user to run some custom functions that add the quantization parameters as layer attributes (see the test file for examples). In the future these functions should be added to PQuant🌶️. ⚠️⚠️

Type of change

  • New feature (non-breaking change which adds functionality)

Tests

Unit test test/pytest/test_pquant.py was added to test correct parsing of the pruned/quantized layers and that the precision is set correctly.
⚠️⚠️ THE TEST IS CURRENTLY BROKEN! The file needs to be split into two, to test PyTorch and Keras frontend separately ⚠️⚠️

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

Copy link
Contributor

@calad0i calad0i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also checkout #1338 so that the bit-exact pass also applies for pquant models - In short, if the quantizers are converted to FixedPointQuantzier object, or linear layers with quantization precision set properly with trusted=True, full model precision propagation (this file) can be used with little changes and all precision inference should work (will need to add rules for the multiplier relu function but likely nothing else).
If you don't want to use that pass, maybe consider deciding if users should use config_from_keras_model or not. If not, maybe leave it untouched and do enforcement in a dedicated pass.

@@ -33,6 +33,10 @@ def handle(
'n_out': n_out,
'n_in': n_in,
}

if hasattr(layer, 'quantization_parameters'):
config['quantization_parameters'] = layer.quantization_parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of standard keras 3 property, should be in your pquant handler (maybe a base class for all pquant layers?)

config['shift'] = 0.5
# Quartus seems to have trouble if the width is 1.
config['slope_prec'] = FixedPrecisionType(width=2, integer=0, signed=False)
config['shift_prec'] = FixedPrecisionType(width=2, integer=0, signed=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

width should be 1 here



@register
class PQuantPoolingHandler(KerasV3LayerHandler):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly the same as the original pooling handler. Can reuse some code there I think?

@@ -141,4 +141,7 @@ def handle(
elif isinstance(layer, BaseConv):
config['weight_data'] = kernel

if hasattr(layer, 'quantization_parameters'):
config['quantization_parameters'] = layer.quantization_parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, should be in pquant base class

@@ -44,6 +44,10 @@ def parse_conv1d_layer(operation, layer_name, input_names, input_shapes, node, c

output_shape = [input_shapes[0][0], layer['n_filt'], layer['out_width']] # Channel first as default

# Quantization parameter for PQuant integration
if hasattr(class_object, "quantization_parameters"):
layer['quantization_parameters'] = class_object.quantization_parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the handler is functional you can't really subclassing it so fine. Though, all quantization parameters saved here can be derived from the values themselves so not strictly necessary....

@@ -352,7 +352,10 @@ def parse_pytorch_model(config, verbose=True):
if '.' not in node.target:
obj = getattr(model, node.name)
else:
obj = getattr(children[node.target.split('.')[0], node.name])
if '_' not in node.name:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining why this is needed

if (datareg > 0) {

if (mul[0] >= 0)
res[ii] = datareg << mul[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<< -3 is legal syntax. Why differentiate +/- cases?

# TODO In the next version of this function, these should not be exposed to user to tweak
layer_config['Precision'][pname] = str(precision)
# PQuant quantization
if 'quantization_parameters' in layer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused here. There are three ways to handle quantzation parameters :

  • QKeras: parser doesn't say anything about the precision used (actually, some, but almost no), quantization configuration are passed through config generated by config_from_model
  • (old) HGQ1: a opt pass enforces the quantization config embedded in the model, by default config_from_model does nothing and is not expected to be used
  • (current) HGQ1/2: a pass infers layer precision config from the weights/activation quantizers w/o explicit passing them. config_from_model still does nothing and not expect to be used

Since you are embedding config to parsed layer dictionary, I am assuming you are going with way 2, but why adding config_from_model precision here? Are the users expected to use config_from_model when converting pquant models? If the two conflicts after user modification, which takes priority?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants