Commit b36c134
authored
[GPU] Fix oneDNN FP16 convolution format selection for channel expansion operations (#33131)
### Details:
- When FP16 dynamic convolution has small input channels (≤4) and large
output channels (e.g., 1024), the current format selection logic chooses
`bfyx → fsv16`, which triggers oneDNN reference kernel instead of
optimized JIT kernel, resulting in significant performance degradation.
- Override output format to planar (bfyx) when input channels are small
(≤ 16), and output channels are large (≥ 32)
**Current behavior:**
- Input: 3 channels → Converted to `bfyx`
- Output: 1024 channels → Remains `fsv16` (only changed when output ≤ 4)
- Result: `bfyx → fsv16` combination uses **reference kernel** (slow)
#### Root Cause
The fsv16 blocked format is optimized for reading many channels but
introduces overhead when used for writing outputs in channel-expansion
scenarios (small input → large output). oneDNN's reference kernel is
selected because:
1. **Inefficient write pattern**: fsv16 output requires interleaved
writes every 16 elements (non-contiguous)
2. **No optimized implementation**: oneDNN doesn't provide JIT-optimized
kernel for fsv16 output generation from small input channels
3. **Scatter write overhead**: Writing 1024 channels in fsv16 format
requires complex block-strided access
### Tickets:
- [CVS-177671](https://jira.devtools.intel.com/browse/CVS-177671)
Signed-off-by: Andrew Park <[email protected]>1 parent 0e34cb4 commit b36c134
1 file changed
+21
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1028 | 1028 | | |
1029 | 1029 | | |
1030 | 1030 | | |
1031 | | - | |
1032 | | - | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
1033 | 1035 | | |
1034 | 1036 | | |
1035 | 1037 | | |
1036 | | - | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
1037 | 1055 | | |
1038 | 1056 | | |
1039 | 1057 | | |
| |||
0 commit comments