frugally-deep 0.16.0 appears to break kernel/model files #3588

MCJack123 · 2025-03-08T13:17:00Z

I recently updated my AI workflow to ROCm 6.3.2 on Arch Linux, and found that some PyTorch operations were crashing with "MIOpen Error: tensor_shape_variable needs to be an array". With a bit of debugging, I was able to narrow it down to fdeep::internal::create_tensor_shape_variable_offset getting an incorrect parameter. I looked around the source of frugally-deep and the model it was loading a bit, and noticed that fdeep was looking for batch_shape, while the model file used batch_input_shape.

This change in 0.16.0 appears to be causing this specific issue: Dobiasd/frugally-deep@a60717c#diff-a674970aa0b9e26d68cc8783ce1aa3f82425780a062969020febb6fda1371701L500-R507 The change modified the expected key from batch_input_shape to batch_shape. However, after fixing that, I found that inbound_nodes now has a significantly different structure as well, also shown in the above commit. I'm not well versed in the inner workings of this stuff, but I'm guessing there's a new file format with TensorFlow 2.16.1 that breaks the old files, and fdeep's update changes it to use that format instead.

The files src/kernels/gfx9[08|0a|42].tn.model will need to be updated to this new format to support frugally-deep 0.16.0 when built with MIOPEN_ENABLE_AI_KERNEL_TUNING (which is default). I'd update it myself in a PR if I knew the format, and was confident it fixed the issue without causing problems, but that is not the case.

The text was updated successfully, but these errors were encountered:

sakura-nyaa · 2025-03-10T01:35:33Z

I'm also getting this error:
"MIOpen Error: tensor_shape_variable needs to be an array"

I get the error whenever using torch.nn.functional.conv2d.
Here's an output with MIOpen logging turned on when running a minimal program to trigger the error:

MIOpen(HIP): Info [get_device_name] Raw device name: gfx1102
MIOpen(HIP): Info [Handle] stream: 0, device_id: 0
MIOpen(HIP): Info [get_device_name] Raw device name: gfx1102
MIOpen(HIP): Info [SetStream] stream: 0, device_id: 0
MIOpen(HIP): miopenStatus_t miopenCreateTensorDescriptor(miopenTensorDescriptor_t *){
MIOpen(HIP):    tensorDesc = 0x7ffc6f739908
MIOpen(HIP): }
MIOpen(HIP): miopenStatus_t miopenSetTensorDescriptor(miopenTensorDescriptor_t, miopenDataType_t, int, const int *, const int *){
MIOpen(HIP):    tensorDesc = {}, {}, packed,
MIOpen(HIP):    dataType = 1
MIOpen(HIP):    nbDims = 4
MIOpen(HIP):    dim.values = { 32 4 32 32 }
MIOpen(HIP):    stride.values = { 4096 1024 32 1 }
MIOpen(HIP): }
MIOpen(HIP): miopenStatus_t miopenCreateTensorDescriptor(miopenTensorDescriptor_t *){
MIOpen(HIP):    tensorDesc = 0x56a9c37153c0
MIOpen(HIP): }
MIOpen(HIP): miopenStatus_t miopenSetTensorDescriptor(miopenTensorDescriptor_t, miopenDataType_t, int, const int *, const int *){
MIOpen(HIP):    tensorDesc = {}, {}, packed,
MIOpen(HIP):    dataType = 1
MIOpen(HIP):    nbDims = 4
MIOpen(HIP):    dim.values = { 32 4 3 3 }
MIOpen(HIP):    stride.values = { 36 9 3 1 }
MIOpen(HIP): }
MIOpen(HIP): miopenStatus_t miopenCreateTensorDescriptor(miopenTensorDescriptor_t *){
MIOpen(HIP):    tensorDesc = 0x7e3015452807
MIOpen(HIP): }
MIOpen(HIP): miopenStatus_t miopenSetTensorDescriptor(miopenTensorDescriptor_t, miopenDataType_t, int, const int *, const int *){
MIOpen(HIP):    tensorDesc = {}, {}, packed,
MIOpen(HIP):    dataType = 1
MIOpen(HIP):    nbDims = 4
MIOpen(HIP):    dim.values = { 32 32 30 30 }
MIOpen(HIP):    stride.values = { 28800 900 30 1 }
MIOpen(HIP): }
MIOpen(HIP): miopenStatus_t miopenCreateConvolutionDescriptor(miopenConvolutionDescriptor_t *){
MIOpen(HIP):    convDesc = 0x100
MIOpen(HIP): }
MIOpen(HIP): Info [] MIOPEN_FIND_MODE = DYNAMIC_HYBRID(5)
MIOpen(HIP): miopenStatus_t miopenInitConvolutionNdDescriptor(miopenConvolutionDescriptor_t, int, const int *, const int *, const int *, miopenConvolutionMode_t){
MIOpen(HIP):    convDesc = conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1},
MIOpen(HIP):    spatialDim = 2
MIOpen(HIP):    pads = { 0 0 }
MIOpen(HIP):    strides = { 1 1 }
MIOpen(HIP):    dilations = { 1 1 }
MIOpen(HIP):    c_mode = 0
MIOpen(HIP): }
MIOpen(HIP): miopenStatus_t miopenSetConvolutionGroupCount(miopenConvolutionDescriptor_t, int){
MIOpen(HIP):    convDesc = conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1},
MIOpen(HIP):    groupCount = 1
MIOpen(HIP): }
MIOpen(HIP): miopenStatus_t miopenSetConvolutionAttribute(miopenConvolutionDescriptor_t, const miopenConvolutionAttrib_t, const int){
MIOpen(HIP):    convDesc = conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1},
MIOpen(HIP):    attr = 1
MIOpen(HIP):    value = 0
MIOpen(HIP): }
MIOpen(HIP): miopenStatus_t miopenConvolutionForwardGetWorkSpaceSize(miopenHandle_t, const miopenTensorDescriptor_t, const miopenTensorDescriptor_t, const miopenConvolutionDescriptor_t, const miopenTensorDescriptor_t, size_t *){
MIOpen(HIP):    handle = stream: 0, device_id: 0
MIOpen(HIP):    wDesc = {32, 4, 3, 3}, {36, 9, 3, 1}, packed,
MIOpen(HIP):    xDesc = {32, 4, 32, 32}, {4096, 1024, 32, 1}, packed,
MIOpen(HIP):    convDesc = conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1},
MIOpen(HIP):    yDesc = {32, 32, 30, 30}, {28800, 900, 30, 1}, packed,
MIOpen(HIP): }
MIOpen(HIP): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, HIP version 6.3.42134, MIOpen version 3.3.0.d22d5a13f-dirty
MIOpen(HIP): Info2 [GetWorkSpaceSize]
MIOpen(HIP): Info [GetSolutions]
MIOpen(HIP): Info [IsNetworkedFilesystem] Filesystem type at '"/home/neil//.config/miopen/"' is: 0xef53 'EXT2/3/4_SUPER_MAGIC'
MIOpen(HIP): Info2 [GetLibPath] Lib Path: "/opt/rocm/lib/libMIOpen.so.1.0"
MIOpen(HIP): Info2 [GetInstalledPathFile] inexact find database search
MIOpen(HIP): Info2 [GetInstalledPathFile] Iterating over find db directory "/opt/rocm/share/miopen/db"
MIOpen(HIP): Info [Measure] ReadonlyRamDb::Prefetch time: 5e-05 ms
MIOpen(HIP): Info [Prefetch] File is unreadable: "/home/neil//.config/miopen/gfx1102_16.HIP.3_3_0_d22d5a13f-dirty.ufdb.txt"
MIOpen(HIP): Info [Measure] RamDb::Prefetch time: 0.00856 ms
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 4-32-32-3x3-32-30-30-32-0x0-1x1-1x1-0-NCHW-FP32-F in cache for file "/home/neil//.config/miopen/gfx1102_16.HIP.3_3_0_d22d5a13f-dirty.ufdb.txt"
MIOpen(HIP): Info2 [FindRecord] Looking for key 4-32-32-3x3-32-30-30-32-0x0-1x1-1x1-0-NCHW-FP32-F in file ""
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.02485 ms
MIOpen Error: tensor_shape_variable needs to be an array
MIOpen(HIP): miopenStatus_t miopenFindConvolutionForwardAlgorithm(miopenHandle_t, const miopenTensorDescriptor_t, const void *, const miopenTensorDescriptor_t, const void *, const miopenConvolutionDescriptor_t, const miopenTensorDescriptor_t, void *, const int, int *, miopenConvAlgoPerf_t *, void *, size_t, bool){
MIOpen(HIP):    handle = stream: 0, device_id: 0
MIOpen(HIP):    xDesc = {32, 4, 32, 32}, {4096, 1024, 32, 1}, packed,
MIOpen(HIP):    x = 0x7e2e66801200
MIOpen(HIP):    wDesc = {32, 4, 3, 3}, {36, 9, 3, 1}, packed,
MIOpen(HIP):    w = 0x7e2e66800000
MIOpen(HIP):    convDesc = conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1},
MIOpen(HIP):    yDesc = {32, 32, 30, 30}, {28800, 900, 30, 1}, packed,
MIOpen(HIP):    y = 0x7e2d5d800000
MIOpen(HIP):    requestAlgoCount = 1
MIOpen(HIP):    returnedAlgoCount = 32764
MIOpen(HIP):    perfResults =
MIOpen(HIP):    workSpace = nullptr
MIOpen(HIP):    workSpaceSize = 0
MIOpen(HIP):    exhaustiveSearch = 0
MIOpen(HIP): }
MIOpen(HIP): Info [FindConvFwdAlgorithm] requestAlgoCount = 1, workspace = 0
MIOpen(HIP): Info [GetSolutions]
MIOpen(HIP): Info2 [FindRecordUnsafe] Looking for key 4-32-32-3x3-32-30-30-32-0x0-1x1-1x1-0-NCHW-FP32-F in cache for file "/home/neil//.config/miopen/gfx1102_16.HIP.3_3_0_d22d5a13f-dirty.ufdb.txt"
MIOpen(HIP): Info2 [FindRecord] Looking for key 4-32-32-3x3-32-30-30-32-0x0-1x1-1x1-0-NCHW-FP32-F in file ""
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.025221 ms
MIOpen Error: tensor_shape_variable needs to be an array
MIOpen(HIP): miopenStatus_t miopenDestroyConvolutionDescriptor(miopenConvolutionDescriptor_t){
MIOpen(HIP):    convDesc = conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1},
MIOpen(HIP): }
MIOpen(HIP): miopenStatus_t miopenDestroyTensorDescriptor(miopenTensorDescriptor_t){
MIOpen(HIP):    tensorDesc = {32, 4, 3, 3}, {36, 9, 3, 1}, packed,
MIOpen(HIP): }
MIOpen(HIP): miopenStatus_t miopenDestroyTensorDescriptor(miopenTensorDescriptor_t){
MIOpen(HIP):    tensorDesc = {32, 32, 30, 30}, {28800, 900, 30, 1}, packed,
MIOpen(HIP): }
MIOpen(HIP): miopenStatus_t miopenDestroyTensorDescriptor(miopenTensorDescriptor_t){
MIOpen(HIP):    tensorDesc = {32, 4, 32, 32}, {4096, 1024, 32, 1}, packed,
MIOpen(HIP): }
Traceback (most recent call last):
  File "/home/neil/trigger_error.py", line 7, in <module>
    result = F.conv2d(input, weight)
RuntimeError: miopenStatusUnknownError

heres the minimal program:

import torch
from torch.nn import functional as F

weight = torch.randn(32, 4, 3, 3).cuda()
input = torch.randn(32, 4, 32, 32).cuda()

result = F.conv2d(input, weight)
print(f"{result.shape=} {result.dtype=} {result.device=}")

I'll try parsing the findings of the OP and see if I can get it working. If I do I'll report back.

IMbackK · 2025-03-11T11:54:18Z

I can repoduce this issue.

MIOpenDriver might be a more convenient way to reproduce this issue, as shown in #3597

IMbackK · 2025-03-11T16:35:26Z

I can confirm this issue stems from frugally-deep 0.16+, building miopen against frugally-deep 0.15.20 avoids this issue.

ppanchad-amd added the Under Investigation label Mar 10, 2025

IMbackK mentioned this issue Mar 11, 2025

Either MIOpenDriver documentation is outdated or MIOpenDriver is broken #3597

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

frugally-deep 0.16.0 appears to break kernel/model files #3588

frugally-deep 0.16.0 appears to break kernel/model files #3588

MCJack123 commented Mar 8, 2025

sakura-nyaa commented Mar 10, 2025

IMbackK commented Mar 11, 2025 •

edited

Loading

IMbackK commented Mar 11, 2025

frugally-deep 0.16.0 appears to break kernel/model files #3588

frugally-deep 0.16.0 appears to break kernel/model files #3588

Comments

MCJack123 commented Mar 8, 2025

sakura-nyaa commented Mar 10, 2025

IMbackK commented Mar 11, 2025 • edited Loading

IMbackK commented Mar 11, 2025

IMbackK commented Mar 11, 2025 •

edited

Loading