Release merge_pr_49083 · web-platform-tests/wpt

webnn: Support block-wise quantization for DirectML backend

Block-wise quantization divides input tensors into smaller blocks that
are independently quantized, resulting in faster optimization and high
precision quantization 1. It is used for popular language models,
such as phi-3 mini int4 quantized model 2. Related WG issue 3 has
been opened to discussion.

Firstly, this CL validates scale and zero point tensors for block-wise
quantization. Besides, this CL also implements the block-wise
quantization in DirectML backend by using DML_OPERATOR_QUANTIZE and
DML_OPERATOR_DEQUANTIZE which are available in FL >= 6.3.

More validation and conformance tests are added to verify the
implementation.

Bug: 40206287
Change-Id: I977b0be57deebd7afcae216edc3ddc3818b8c09f
Cq-Include-Trybots: luci.chromium.try:mac14.arm64-blink-rel, mac14-blink-rel, mac15.arm64-blink-rel, mac15-blink-rel, linux-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5964816
Reviewed-by: Rafael Cintron [email protected]
Reviewed-by: ningxin hu [email protected]
Commit-Queue: ningxin hu [email protected]
Cr-Commit-Position: refs/heads/main@{#1380767}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge_pr_49083