Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracker] All the issue related with e2e shark test suite #812

Open
pdhirajkumarprasad opened this issue Aug 27, 2024 · 4 comments
Open

Comments

@pdhirajkumarprasad
Copy link

pdhirajkumarprasad commented Aug 27, 2024

Full ONNX FE tracker is at: #564

ONNX model Zoo model tracker : #886

HF model tracker : #899

Running model

In alt_e2e test suite:

Set environment variable CACHE_DIR to specify where to download model artifacts.

If debugging compilation failures with local builds of torch-mlir or iree, please make sure the locally built tools are the ones being run by the commands (see commands log for a test). E.g., running which iree-compile should point to the local build directory.

By default, the test runner doesn't use torch-mlir directly. If you'd like to use a local build of torch-mlir, make sure torch-mlir-opt is on your path and use the run.py flag --torchtolinalg to enable running the frontend passes through torch-mlir-opt.

Get the failing model's name and run:

python ./run.py -v -t ModelName

After running the test, the test-run/ModelName/detail/ directory should contain detailed error logs for stage failures. To rerun locally, you can copy and paste the corresponding script from test-run/ModelName/commands/ directory.

For onnx/models/

CPU Compilation Failures

Last updated based on run https://github.com/nod-ai/e2eshark-reports/blob/main/2025-02-12/ci_reports_onnx/llvm-cpu/combined-reports/summary.md

# device issue type issue no #model impacted list of model assignee status
1 CPU onnx.IF 1 KeypointRCNN_vaiq_int8
2 CPU onnx.Multinomial ui64 -> f32 1 migraphx_agentmodel__AgentModel @zjgarvey iree-org/iree#19556
3 CPU onnx.LSTM 1 sequencer2d_l @zjgarvey add to basic opt
4 CPU onnx.Split shape missing 1 migraphx_bert__bertsquad-12 @zjgarvey regressed?
5 CPU 'linalg.generic' op write affecting operations on global resources are restricted to workgroup distributed contexts 1 resnest50d_1s4x24d_vaiq
6 CPU error: 'hal.tensor.barrier' op failed to verify that all of {sources, results} have same type 820 1 migraphx_onnx-model-zoo__gpt2-10 @renxida @AmosLewis will message xida
7 CPU tensor.dim' op unexpected during shape cleanup; dynamic dimensions must have been resolved prior to leaving the flow dialect retinanet_resnet50_fpn_vaiq_int8 @vivekkhandelwal1
8 CPU assertInVersionRange: Assertion version >= version_range.first && version <= version_range.second failed: Warning: invalid version migraphx_sdxl__unet__model
9 CPU Assertion g.get() != nullptr failed: Warning: onnx version converter is unable to parse input model migraphx_sd__unet__model
10 CPU Protobuf serialization failed maxvit_xlarge_tf_512.in21k_ft_in1k

import and setup failures

setup failures:

  • maxvit_xlarge_tf_512.in21k_ft_in1k

import failures:

  • migraphx_sd__unet__model
  • migraphx_sdxl__unet__model

After triage, add to table and assign:

# device issue type issue no #model impacted list of model assignee status

iree-compile

IREE project tracker: https://github.com/orgs/iree-org/projects/8/views/3

# device issue type issue no #model impacted list of model assignee Status
1 GPU func.func' op uses 401920 bytes of shared memory; exceeded the limit of 65536 bytes 18603 106
2 GPU 'arith.extui' op operand type 'i64' and result type 'i32' are cast incompatible 19179 10 @pashu123

iree runtime

# device issue type issue no #model impacted list of model assignee Status

numerics

# device issue type issue no #model impacted list of model assignee
1 CPU numeric need_to_analyze 101 modleList
2 [numeric] Numeric error for Conv operator with quantize/dequantize 50+ 19416

IREE EP only issues

iree-compile fails with ElementsAttr does not provide iteration facilities for type 'mlir::Attribute' on int8 models at QuantizeLinear op

low priority

issue no 828 Turbine Camp
Issue no 797 Ops not in model

@nod-ai nod-ai deleted a comment Aug 27, 2024
@nod-ai nod-ai deleted a comment from yiweifengyan Aug 27, 2024
@zjgarvey
Copy link
Collaborator

Can you update the model List links?

@jinchen62
Copy link
Contributor

Could you also attach the issue links you referred to so we would know if we cover all model paths. Also it seems not including #801 right?

@pdhirajkumarprasad
Copy link
Author

@zjgarvey the model list contain the updated link only.

@jinchen62 Yes, so far the report is based on onnx model of e2e shark test suite

@jinchen62
Copy link
Contributor

jinchen62 commented Aug 29, 2024

@pdhirajkumarprasad I think it would be helpful to attach more details of the error message.

I feel like the onnx.Transpose one in onnx to torch is the shape inference issue that I was dealing with. I fixed it by setting opset version to 21 with locally built torch-mlir in shark testsuite llvm/torch-mlir#3593. @zjgarvey I realized that this seems not working for the CI job, right? Any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants