Skip to content

add: detect Q/DQ with int16/uint16 initializers for GPU Scale Transform Pass #768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: ovep-develop
Choose a base branch
from

Conversation

ankitm3k
Copy link

@ankitm3k ankitm3k commented Aug 4, 2025

Description

This PR enables the GPU Scale Transform Pass by detecting the UINT16 & INT16 Initializers type in the Q/DQ Nodes in the graph. This isolates the dependency on using the enable_qdq_optimizer provider option pass in the legacy OVEP code.

@ankitm3k ankitm3k requested review from sfatimar and vthaniel August 4, 2025 11:37
@ankitm3k ankitm3k changed the title add: detect Q/DQ with int16/uint16 initializers add: detect Q/DQ with int16/uint16 initializers for GPU Scale Transform Pass Aug 4, 2025
@ankitm3k
Copy link
Author

ankitm3k commented Aug 4, 2025

@mklimenk Please test, review & merge

Copy link

@mklimenk mklimenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests to the IsQDQGraphWithUint16OrInt16() function to make sure that we cover all the cases.
Also please remove excessive comments, there's no need to be that explicit.

Comment on lines 396 to 405
auto is_16bit_tensor = [](const onnxruntime::NodeArg* node_arg) -> bool {
if (!node_arg) return false;
const auto* type_proto = node_arg->TypeAsProto();
if (type_proto && type_proto->has_tensor_type()) {
auto elem_type = type_proto->tensor_type().elem_type();
return (elem_type == ONNX_NAMESPACE::TensorProto_DataType_UINT16 ||
elem_type == ONNX_NAMESPACE::TensorProto_DataType_INT16);
}
return false;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move it to a separate function, there's no need for long multi-line lambdas

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 427 to 428
// QuantizeLinear: [float_input, scale, zero_point] -> [quantized_output]
// The quantized output tensor determines the quantization type
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove identical comments

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


// Zero point (index 2) must match quantized tensor type per ONNX spec
// It's optional - absent for INT32 and some float8 types
if (input_defs.size() >= 3 && is_16bit_tensor(input_defs[2])) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be output_defs[2]? It seems like the portion in the previous condition

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes its the zero_point dtype that is tested to check the INT16/UINT16 dtype

@ankitm3k ankitm3k force-pushed the ankit/gpu_qdq_changes branch from fbf966a to 6ceb8e7 Compare August 4, 2025 15:50
@ankitm3k ankitm3k requested a review from javier-intel August 4, 2025 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants