[GPU] Fix oneDNN FP16 convolution format selection for channel expansion operations #33131

andrew-k-park · 2025-12-05T07:08:56Z

Details:

When FP16 dynamic convolution has small input channels (≤4) and large output channels (e.g., 1024), the current format selection logic chooses bfyx → fsv16, which triggers oneDNN reference kernel instead of optimized JIT kernel, resulting in significant performance degradation.
Override output format to planar (bfyx) when input channels are small (≤ 16), and output channels are large (≥ 32)

Current behavior:

Input: 3 channels → Converted to bfyx
Output: 1024 channels → Remains fsv16 (only changed when output ≤ 4)
Result: bfyx → fsv16 combination uses reference kernel (slow)

Root Cause

The fsv16 blocked format is optimized for reading many channels but introduces overhead when used for writing outputs in channel-expansion scenarios (small input → large output). oneDNN's reference kernel is selected because:

Inefficient write pattern: fsv16 output requires interleaved writes every 16 elements (non-contiguous)
No optimized implementation: oneDNN doesn't provide JIT-optimized kernel for fsv16 output generation from small input channels
Scatter write overhead: Writing 1024 channels in fsv16 format requires complex block-strided access

Tickets:

CVS-177671

andrew-k-park · 2025-12-10T04:15:45Z

no perf regression

…ion operations (#33131) ### Details: - When FP16 dynamic convolution has small input channels (≤4) and large output channels (e.g., 1024), the current format selection logic chooses `bfyx → fsv16`, which triggers oneDNN reference kernel instead of optimized JIT kernel, resulting in significant performance degradation. - Override output format to planar (bfyx) when input channels are small (≤ 16), and output channels are large (≥ 32) **Current behavior:** - Input: 3 channels → Converted to `bfyx` - Output: 1024 channels → Remains `fsv16` (only changed when output ≤ 4) - Result: `bfyx → fsv16` combination uses **reference kernel** (slow) #### Root Cause The fsv16 blocked format is optimized for reading many channels but introduces overhead when used for writing outputs in channel-expansion scenarios (small input → large output). oneDNN's reference kernel is selected because: 1. **Inefficient write pattern**: fsv16 output requires interleaved writes every 16 elements (non-contiguous) 2. **No optimized implementation**: oneDNN doesn't provide JIT-optimized kernel for fsv16 output generation from small input channels 3. **Scatter write overhead**: Writing 1024 channels in fsv16 format requires complex block-strided access ### Tickets: - [CVS-177671](https://jira.devtools.intel.com/browse/CVS-177671) Signed-off-by: Andrew Park <[email protected]>

…ge channel expansion Signed-off-by: Andrew Park <[email protected]>

…ion operations (#33131) ### Details: - When FP16 dynamic convolution has small input channels (≤4) and large output channels (e.g., 1024), the current format selection logic chooses `bfyx → fsv16`, which triggers oneDNN reference kernel instead of optimized JIT kernel, resulting in significant performance degradation. - Override output format to planar (bfyx) when input channels are small (≤ 16), and output channels are large (≥ 32) **Current behavior:** - Input: 3 channels → Converted to `bfyx` - Output: 1024 channels → Remains `fsv16` (only changed when output ≤ 4) - Result: `bfyx → fsv16` combination uses **reference kernel** (slow) #### Root Cause The fsv16 blocked format is optimized for reading many channels but introduces overhead when used for writing outputs in channel-expansion scenarios (small input → large output). oneDNN's reference kernel is selected because: 1. **Inefficient write pattern**: fsv16 output requires interleaved writes every 16 elements (non-contiguous) 2. **No optimized implementation**: oneDNN doesn't provide JIT-optimized kernel for fsv16 output generation from small input channels 3. **Scatter write overhead**: Writing 1024 channels in fsv16 format requires complex block-strided access ### Tickets: - [CVS-177671](https://jira.devtools.intel.com/browse/CVS-177671) Signed-off-by: Andrew Park <[email protected]>

andrew-k-park requested review from a team as code owners December 5, 2025 07:08

github-actions bot added the category: GPU OpenVINO GPU plugin label Dec 5, 2025

andrew-k-park force-pushed the fix_fp16_conv_format_selection branch 2 times, most recently from dfa6239 to 1c2d830 Compare December 9, 2025 12:18

e-ddykim approved these changes Dec 10, 2025

View reviewed changes

e-ddykim added this pull request to the merge queue Dec 10, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 10, 2025

andrew-k-park added this pull request to the merge queue Dec 10, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 10, 2025

andrew-k-park force-pushed the fix_fp16_conv_format_selection branch from 1c2d830 to 47d734e Compare December 10, 2025 06:45

andrew-k-park enabled auto-merge December 10, 2025 06:45

p-durandin added this to the 2026.0 milestone Dec 10, 2025

andrew-k-park added this pull request to the merge queue Dec 10, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 10, 2025

andrew-k-park added this pull request to the merge queue Dec 10, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 10, 2025

andrew-k-park force-pushed the fix_fp16_conv_format_selection branch from 47d734e to 3380c20 Compare December 10, 2025 12:21

andrew-k-park enabled auto-merge December 10, 2025 12:22

andrew-k-park added this pull request to the merge queue Dec 11, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 11, 2025

andrew-k-park force-pushed the fix_fp16_conv_format_selection branch from 3380c20 to 4d9fdf5 Compare December 11, 2025 01:36

andrew-k-park enabled auto-merge December 11, 2025 01:36

Optimize OneDNN dynamic convolution layout selection for small-to-lar…

4d9fdf5

…ge channel expansion Signed-off-by: Andrew Park <[email protected]>

andrew-k-park added this pull request to the merge queue Dec 11, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 11, 2025

andrew-k-park added this pull request to the merge queue Dec 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GPU] Fix oneDNN FP16 convolution format selection for channel expansion operations #33131

[GPU] Fix oneDNN FP16 convolution format selection for channel expansion operations #33131

andrew-k-park commented Dec 5, 2025

Uh oh!

andrew-k-park commented Dec 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[GPU] Fix oneDNN FP16 convolution format selection for channel expansion operations #33131

[GPU] Fix oneDNN FP16 convolution format selection for channel expansion operations #33131

Conversation

andrew-k-park commented Dec 5, 2025

Details:

Root Cause

Tickets:

Uh oh!

andrew-k-park commented Dec 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants