-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
frugally-deep 0.16.0 appears to break kernel/model files #3588
Comments
I'm also getting this error: I get the error whenever using torch.nn.functional.conv2d.
heres the minimal program:
I'll try parsing the findings of the OP and see if I can get it working. If I do I'll report back. |
I can repoduce this issue. MIOpenDriver might be a more convenient way to reproduce this issue, as shown in #3597 |
I can confirm this issue stems from frugally-deep 0.16+, building miopen against frugally-deep 0.15.20 avoids this issue. |
I recently updated my AI workflow to ROCm 6.3.2 on Arch Linux, and found that some PyTorch operations were crashing with "MIOpen Error: tensor_shape_variable needs to be an array". With a bit of debugging, I was able to narrow it down to
fdeep::internal::create_tensor_shape_variable_offset
getting an incorrect parameter. I looked around the source of frugally-deep and the model it was loading a bit, and noticed that fdeep was looking forbatch_shape
, while the model file usedbatch_input_shape
.This change in 0.16.0 appears to be causing this specific issue: Dobiasd/frugally-deep@a60717c#diff-a674970aa0b9e26d68cc8783ce1aa3f82425780a062969020febb6fda1371701L500-R507 The change modified the expected key from
batch_input_shape
tobatch_shape
. However, after fixing that, I found thatinbound_nodes
now has a significantly different structure as well, also shown in the above commit. I'm not well versed in the inner workings of this stuff, but I'm guessing there's a new file format with TensorFlow 2.16.1 that breaks the old files, and fdeep's update changes it to use that format instead.The files
src/kernels/gfx9[08|0a|42].tn.model
will need to be updated to this new format to support frugally-deep 0.16.0 when built withMIOPEN_ENABLE_AI_KERNEL_TUNING
(which is default). I'd update it myself in a PR if I knew the format, and was confident it fixed the issue without causing problems, but that is not the case.The text was updated successfully, but these errors were encountered: