[Tracker] All the issue related with e2e shark test suite #812

pdhirajkumarprasad · 2024-08-27T16:00:31Z

Full ONNX FE tracker is at: #564

ONNX model Zoo model tracker : #886

HF model tracker : #899

Running model

In alt_e2e test suite:

Set environment variable CACHE_DIR to specify where to download model artifacts.

If debugging compilation failures with local builds of torch-mlir or iree, please make sure the locally built tools are the ones being run by the commands (see commands log for a test). E.g., running which iree-compile should point to the local build directory.

By default, the test runner doesn't use torch-mlir directly. If you'd like to use a local build of torch-mlir, make sure torch-mlir-opt is on your path and use the run.py flag --torchtolinalg to enable running the frontend passes through torch-mlir-opt.

Get the failing model's name and run:

python ./run.py -v -t ModelName

After running the test, the test-run/ModelName/detail/ directory should contain detailed error logs for stage failures. To rerun locally, you can copy and paste the corresponding script from test-run/ModelName/commands/ directory.

For onnx/models/

CPU Compilation Failures

Last updated based on run https://github.com/nod-ai/e2eshark-reports/blob/main/2025-02-12/ci_reports_onnx/llvm-cpu/combined-reports/summary.md

#	device	issue type	issue no	#model impacted	list of model	assignee	status
1	CPU	onnx.IF		1	KeypointRCNN_vaiq_int8
2	CPU	onnx.Multinomial ui64 -> f32		1	migraphx_agentmodel__AgentModel	@zjgarvey	iree-org/iree#19556
3	CPU	onnx.LSTM		1	sequencer2d_l	@zjgarvey	add to basic opt
4	CPU	onnx.Split shape missing		1	migraphx_bert__bertsquad-12	@zjgarvey	regressed?
5	CPU	'linalg.generic' op write affecting operations on global resources are restricted to workgroup distributed contexts		1	resnest50d_1s4x24d_vaiq
6	CPU	error: 'hal.tensor.barrier' op failed to verify that all of {sources, results} have same type	820	1	migraphx_onnx-model-zoo__gpt2-10	@renxida	@AmosLewis will message xida
7	CPU	tensor.dim' op unexpected during shape cleanup; dynamic dimensions must have been resolved prior to leaving the flow dialect			retinanet_resnet50_fpn_vaiq_int8	@vivekkhandelwal1
8	CPU	assertInVersionRange: Assertion `version >= version_range.first && version <= version_range.second` failed: Warning: invalid version			migraphx_sdxl__unet__model
9	CPU	Assertion `g.get() != nullptr` failed: Warning: onnx version converter is unable to parse input model			migraphx_sd__unet__model
10	CPU	Protobuf serialization failed			maxvit_xlarge_tf_512.in21k_ft_in1k

import and setup failures

setup failures:

maxvit_xlarge_tf_512.in21k_ft_in1k

import failures:

migraphx_sd__unet__model
migraphx_sdxl__unet__model

After triage, add to table and assign:

#	device	issue type	issue no	#model impacted	list of model	assignee	status

iree-compile

IREE project tracker: https://github.com/orgs/iree-org/projects/8/views/3

#	device	issue type	issue no	#model impacted	list of model	assignee	Status
1	GPU	func.func' op uses 401920 bytes of shared memory; exceeded the limit of 65536 bytes	18603	106
2	GPU	'arith.extui' op operand type 'i64' and result type 'i32' are cast incompatible	19179	10		@pashu123

iree runtime

#	device	issue type	issue no	#model impacted	list of model	assignee	Status

numerics

#	device	issue type	issue no	#model impacted	list of model	assignee
1	CPU	numeric	need_to_analyze	101	modleList
2	[numeric] Numeric error for Conv operator with quantize/dequantize	50+	19416

IREE EP only issues

iree-compile fails with ElementsAttr does not provide iteration facilities for type 'mlir::Attribute' on int8 models at QuantizeLinear op

low priority

issue no 828 Turbine Camp
Issue no 797 Ops not in model

The text was updated successfully, but these errors were encountered:

zjgarvey · 2024-08-27T19:08:32Z

Can you update the model List links?

jinchen62 · 2024-08-27T21:00:44Z

Could you also attach the issue links you referred to so we would know if we cover all model paths. Also it seems not including #801 right?

pdhirajkumarprasad · 2024-08-28T04:38:56Z

@zjgarvey the model list contain the updated link only.

@jinchen62 Yes, so far the report is based on onnx model of e2e shark test suite

jinchen62 · 2024-08-29T23:31:02Z

@pdhirajkumarprasad I think it would be helpful to attach more details of the error message.

I feel like the onnx.Transpose one in onnx to torch is the shape inference issue that I was dealing with. I fixed it by setting opset version to 21 with locally built torch-mlir in shark testsuite llvm/torch-mlir#3593. @zjgarvey I realized that this seems not working for the CI job, right? Any ideas?

nod-ai deleted a comment Aug 27, 2024

nod-ai deleted a comment from yiweifengyan Aug 27, 2024

kumardeepakamd mentioned this issue Aug 29, 2024

[Tracker] Onnx FE Support #564

Open

kumardeepakamd mentioned this issue Sep 12, 2024

Turbine Camp #828

Open

25 tasks

PhaneeshB mentioned this issue Sep 12, 2024

Fix Onnx.DFT Torch->Linalg lowering #800

Open

vinayakdsci mentioned this issue Oct 9, 2024

failed to legalize operation 'hal.interface.constant.load' iree-org/iree#18487

Open

pdhirajkumarprasad mentioned this issue Oct 28, 2024

removing model which are not valid and duplicates nod-ai/SHARK-TestSuite#379

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracker] All the issue related with e2e shark test suite #812

[Tracker] All the issue related with e2e shark test suite #812

pdhirajkumarprasad commented Aug 27, 2024 •

edited

Loading

zjgarvey commented Aug 27, 2024

jinchen62 commented Aug 27, 2024

pdhirajkumarprasad commented Aug 28, 2024

jinchen62 commented Aug 29, 2024 •

edited

Loading

[Tracker] All the issue related with e2e shark test suite #812

[Tracker] All the issue related with e2e shark test suite #812

Comments

pdhirajkumarprasad commented Aug 27, 2024 • edited Loading

Running model

For onnx/models/

CPU Compilation Failures

import and setup failures

iree-compile

iree runtime

numerics

IREE EP only issues

low priority

zjgarvey commented Aug 27, 2024

jinchen62 commented Aug 27, 2024

pdhirajkumarprasad commented Aug 28, 2024

jinchen62 commented Aug 29, 2024 • edited Loading

pdhirajkumarprasad commented Aug 27, 2024 •

edited

Loading

jinchen62 commented Aug 29, 2024 •

edited

Loading