HF model tracker #899

pdhirajkumarprasad · 2025-01-09T08:51:19Z

Total no. of models	545
PASS	307 -> 408
Numeric	12 -> 37
compilation
compiled_inference
setup and import

Detailed list

amd-vivekag · 2025-02-13T09:54:51Z

Passing Summary

TOTAL TESTS = 544

Stage	# Passing	% of Total	% of Attempted
Setup	532	97.8%	97.8%
IREE Compilation	457	84.0%	85.9%
Gold Inference	451	82.9%	98.7%
IREE Inference Invocation	445	81.8%	98.7%
Inference Comparison (PASS)	406	74.6%	91.2%

Fail Summary

TOTAL TESTS = 544

Stage	# Failed at Stage	% of Total
Setup	12	2.2%
IREE Compilation	75	13.8%
Gold Inference	6	1.1%
IREE Inference Invocation	6	1.1%
Inference Comparison	39	7.2%

GIST containing all the failures: https://gist.github.com/amd-vivekag/377a7b141b40c118f880b2ced176f95c

Following issues failing in CPU:

#	Issue type	Issue Message	Issue no	#Model impacted	List of model
1	setup	ImportError("Loading an AWQ quantized model requires auto-awq library (`pip install autoawq`)	918	2	hf_Midnight-Miqu-70B-v1.5-4bit, hf_Meta-Llama-3.1-8B-Instruct-AWQ-INT4
3	setup	IndexError: index out of range in self	920	1	hf_ruRoPEBert-e5-base-2k
5	setup	importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes	922	1	hf_Meta-Llama-3.1-8B-Instruct-bnb-4bit
6	setup	RuntimeError: Error(s) in loading state_dict for DebertaV2ForMultipleChoice:	923	1	hf_fine-tuned-MoritzLaurer-deberta-v3-large-zeroshot-v2.0-arceasy
7	setup	TypeError: DisableCompileContextManager.enter....() got an unexpected keyword argument 'dtype'	924	1	hf_Llama3-8B-1.58-100B-tokens-GGUF
8	setup	torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::bitwise_and' to ONNX opset version 14 is not supported	925	1	hf_Mistral-7B-Instruct-v0.2-GPTQ
12	import_model	Assertion `node->outputs().size() < 4` failed	#929	1	hf_nfnet_l0.ra2_in1k
13	compilation	error: failed to legalize operation 'torch.operator' that was explicitly marked illegal (onnx.If return type issue)	#930	45	hf_1_microsoft_deberta_V1.0, hf_1_microsoft_deberta_V1.1, hf_checkpoints_10_1_microsoft_deberta_V1.1_384, hf_checkpoints_1_16, hf_checkpoints_26_9_microsoft_deberta_21_9, hf_checkpoints_28_9_microsoft_deberta_V2, hf_checkpoints_28_9_microsoft_deberta_V4, hf_checkpoints_28_9_microsoft_deberta_V5, hf_checkpoints_29_9_microsoft_deberta_V1, hf_checkpoints_30_9_microsoft_deberta_V1.0_384, hf_checkpoints_3_14, hf_content, hf_deberta-base, hf_deberta_finetuned_pii, hf_deberta-large-mnli, hf_Debertalarg_model_multichoice_Version2, hf_deberta-v2-base-japanese, hf_deberta-v2-base-japanese-char-wwm, hf_deberta-v3-base, hf_deberta-v3-base-absa-v1.1, hf_deberta-v3-base_finetuned_ai4privacy_v2, hf_deberta-v3-base-injection, hf_DeBERTa-v3-base-mnli-fever-anli, hf_deberta-v3-base-squad2, hf_deberta-v3-base-zeroshot-v1.1-all-33, hf_deberta-v3-large, hf_deberta-v3-large_boolq, hf_deberta-v3-large-squad2, hf_deberta-v3-large_test, hf_deberta-v3-large_test_9e-6, hf_deberta-v3-small, hf_deberta-v3-xsmall, hf_llm-mdeberta-v3-swag, hf_mdeberta-v3-base, hf_mDeBERTa-v3-base-mnli-xnli, hf_mdeberta-v3-base-squad2, hf_mDeBERTa-v3-xnli-ft-bs-multiple-choice, hf_Medical-NER, hf_mxbai-rerank-base-v1, hf_mxbai-rerank-xsmall-v1, hf_nli-deberta-v3-base, hf_output, hf_piiranha-v1-detect-personal-information, hf_splinter-base, hf_splinter-base-qass
14	compilation	error: failed to legalize unresolved materialization from ('i64') to ('index') that remained live after conversion	iree-org/iree#18899	3	hf_deeplabv3-mobilevit-small, hf_deeplabv3-mobilevit-xx-small, hf_mobilevit-small
15	compilation	error: 'flow.dispatch.workgroups' op value set has 3 dynamic dimensions but only 2 dimension values are attached	iree-org/iree#20154	3	hf_beit-base-patch16-224-pt22k, hf_beit-base-patch16-224-pt22k-ft22k, hf_pedestrian_gender_recognition
16	compilation	error: expected sizes to be non-negative, but got -1	iree-org/iree#19501	7	hf_swin_base_patch4_window7_224.ms_in22k_ft_in1k, hf_swin-tiny-patch4-window7-224, hf_yolos-base, hf_yolos-fashionpedia, hf_yolos-small, hf_yolos-small-finetuned-license-plate-detection, hf_yolos-small-rego-plates-detection
17	compilation	error: 'stream.async.dispatch' op has invalid Read access range	iree-org/iree#20155	1	hf_dpt-large-ade
18	compilation	error: 'iree_linalg_ext.pack' op write affecting operations on global resources are restricted to workgroup distributed contexts.	iree-org/iree#20156	1	hf_distilhubert
19	compilation	error: expected offsets to be non-negative, but got -1	iree-org/iree#19935	1	hf_pnasnet5large.tf_in1k
23	native_inference	[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: pixel_values for the following indices	#941	1	hf_mobilenet_v1_0.75_192
24	native_inference	[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Add node	#942	1	hf_eva_large_patch14_196.in22k_ft_in22k_in1k
26	compiled_inference	:0: FAILED_PRECONDITION; onnx.Expand input has a dim that is not statically 1	#944	2	hf_phobert-base-finetuned, hf_phobert-large-finetuned

Following issues resolved:

#	Issue type	Issue Message	Issue no	#Model impacted	List of model	Assignee	Status
2	setup	requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url	919	3	hf_Multiple_Choice, hf_multiple_choice_model, hf_Multiple_Choice_EN	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#456
4	setup	Unknown task: fill-mask	921	2	hf_multi-qa-mpnet-base-cos-v1, hf_all-mpnet-base-v1	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#456
9	import_model	Killed due to OOM	#926	1	hf_StableBeluga2	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#451
10	import_model	assertNonNull: Assertion `g.get() != nullptr` failed	#927	5	hf_esm2_t36_3B_UR50D, hf_Phi-3.5-mini-instruct, hf_Phi-3-mini-128k-instruct, hf_Phi-3-mini-4k-instruct, hf_zephyr-7b-beta	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#451
11	import_model	assertInVersionRange: Assertion `version >= version_range.first && version <= version_range.second` failed	#928	8	hf_llama-7b, hf_oasst-sft-4-pythia-12b-epoch-3.5, hf_Qwen2.5-1.5B-Instruct, hf_Qwen2.5-7B-Instruct, hf_Qwen2-7B-Instruct, hf_TinyLlama-1.1B-Chat-v1.0, hf_vicuna-7b-v1.5, hf_wasmai-7b-v1	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#451
20	construct_inputs	ValueError: Asking to pad but the tokenizer does not have a padding token	#938	4	hf_distilgpt2, hf_gpt2, hf_llama-68m, hf_tiny-random-mistral	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#451
21	construct_inputs	name 'tokens' is not defined	#939	1	hf_wavlm-base-plus	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#442
22	native_inference	IndexError: tuple index out of range	#940	14	hf_bart-base, hf_gpt2-small-spanish, hf_ivila-row-layoutlm-finetuned-s2vl-v2, hf_opt-125m, hf_Qwen1.5-0.5B-Chat, hf_Qwen2-0.5B, hf_Qwen2.5-0.5B-Instruct, hf_really-tiny-falcon-testing, hf_tiny-dummy-qwen2, hf_tiny-Qwen2ForCausalLM-2.5, hf_tiny-random-GemmaForCausalLM, hf_tiny-random-LlamaForCausalLM, hf_tiny-random-mt5, hf_tiny-random-Phi3ForCausalLM	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#447
25	compiled_inference	INVALID_ARGUMENT; function expected fewer input values; parsing input `input.bin`	#943	4	hf_ko-sroberta-multitask, hf_robertuito-sentiment-analysis, hf_sbert_large_nlu_ru, hf_sentence-bert-base-ja-mean-tokens-v2	@amd-vivekag	Fixed in PR: nod-ai/SHARK-TestSuite#453

zjgarvey · 2025-02-13T17:09:02Z

I assume the most recent run is on CPU? Can you share the detail table in a gist? Can you also post the IREE version?

amd-vivekag · 2025-02-13T17:44:15Z

I assume the most recent run is on CPU? Can you share the detail table in a gist? Can you also post the IREE version?

Yes, these are run on CPU. I was getting more failures (around 40 more failures on GPU). I'm using following IREE version:

IREE (https://iree.dev):
  IREE compiler version 3.2.0rc20250206 @ f3bef2de123f08b4fc3b0ce691494891bd6760d0
  LLVM version 20.0.0git
  Optimized build

Following is the detailed table link:
https://gist.github.com/amd-vivekag/377a7b141b40c118f880b2ced176f95c

pdhirajkumarprasad · 2025-02-27T09:50:37Z

Here is latest status on HF model https://gist.github.com/pdhirajkumarprasad/784eee989d6935d1074c217de2040477 we should focus on 6/7 issues that mentioned on this.

@amd-vivekag, please list the issue number for the issue mentioned on above page

@zjgarvey we need to focus on these and let's try to get clean by next week so that we are in good shape w.r.t HF models

vmnit · 2025-03-25T06:51:01Z

Latest failure summary:
Following is the latest number of HF failures:

     66 compilation
      2 compiled_inference
      1 import_model
      4 native_inference
     36 Numerics
      7 setup

Here,

2 extra native_inference failures are coming from onnxruntime segmentation fault with latest release 1.21.0 (passing with 1.20.1). Issue created microsoft/onnxruntime#24144
5 extra compilation failures (2 failing since 3.3.0rc20250319 and 3 failing since 3.3.0rc20250312)

Failing since iree-base-compiler v3.3.0rc20250319 (git commit: iree-org/iree@fba3a7c )

hf_detr-layout-detection
hf_detr-resnet-50-panoptic

Issue: iree-org/iree#20379

Failing since iree-base-compiler v3.3.0rc20250312 (PR: iree-org/iree#20159)

hf_table-transformer-detection
hf_table-transformer-detection-custom-ale
hf_vit_base_patch32_224.augreg_in21k_ft_in1k

Issue: iree-org/iree#20277

amd-vivekag · 2025-03-25T06:55:05Z

Passing Summary

TOTAL TESTS = 541

Stage	# Passing	% of Total	% of Attempted
Setup	534	98.7%	98.7%
IREE Compilation	467	86.3%	87.5%
Gold Inference	465	86.0%	99.6%
IREE Inference Invocation	463	85.6%	99.6%
Inference Comparison (PASS)	427	78.9%	92.2%

Fail Summary

TOTAL TESTS = 541

Stage	# Failed at Stage	% of Total
Setup	7	1.3%
IREE Compilation	67	12.4%
Gold Inference	2	0.4%
IREE Inference Invocation	2	0.4%
Inference Comparison	36	6.7%

Test Run Detail

Test was run with the following arguments:
Namespace(device='local-task', backend='llvm-cpu', target_chip='x86_64-linux-gnu', iree_compile_args=None, mode='cl-onnx-iree', torchtolinalg=False, stages=None, skip_stages=None, benchmark=False, load_inputs=False, groups='all', test_filter=None, testsfile='hf_tests.txt', tolerance=None, verbose=True, rundirectory='test-run', no_artifacts=False, cleanup='0', report=True, report_file='reports/hf_all_tests_543.md', get_metadata=True)

Test	Exit Status	Mean Benchmark Time (ms)
hf_1_microsoft_deberta_V1.0	compilation	None
hf_1_microsoft_deberta_V1.1	compilation	None
hf_bart-large-mnli	Numerics	None
hf_beit-base-patch16-224-pt22k	compilation	None
hf_beit-base-patch16-224-pt22k-ft22k	compilation	None
hf_checkpoints_10_1_microsoft_deberta_V1.1_384	compilation	None
hf_checkpoints_1_16	compilation	None
hf_checkpoints_26_9_microsoft_deberta_21_9	compilation	None
hf_checkpoints_28_9_microsoft_deberta_V2	compilation	None
hf_checkpoints_28_9_microsoft_deberta_V4	compilation	None
hf_checkpoints_28_9_microsoft_deberta_V5	compilation	None
hf_checkpoints_29_9_microsoft_deberta_V1	compilation	None
hf_checkpoints_30_9_microsoft_deberta_V1.0_384	compilation	None
hf_checkpoints_3_14	compilation	None
hf_content	compilation	None
hf_deberta-base	compilation	None
hf_deberta-large-mnli	compilation	None
hf_deberta-v2-base-japanese	compilation	None
hf_deberta-v2-base-japanese-char-wwm	compilation	None
hf_deberta-v3-base	compilation	None
hf_deberta-v3-base-absa-v1.1	compilation	None
hf_deberta-v3-base-injection	compilation	None
hf_DeBERTa-v3-base-mnli-fever-anli	compilation	None
hf_deberta-v3-base-squad2	compilation	None
hf_deberta-v3-base-zeroshot-v1.1-all-33	compilation	None
hf_deberta-v3-base_finetuned_ai4privacy_v2	compilation	None
hf_deberta-v3-large	compilation	None
hf_deberta-v3-large-squad2	compilation	None
hf_deberta-v3-large_boolq	compilation	None
hf_deberta-v3-large_test	compilation	None
hf_deberta-v3-large_test_9e-6	compilation	None
hf_deberta-v3-small	compilation	None
hf_deberta-v3-xsmall	compilation	None
hf_deberta_finetuned_pii	compilation	None
hf_Debertalarg_model_multichoice_Version2	compilation	None
hf_deeplabv3-mobilevit-small	compilation	None
hf_deeplabv3-mobilevit-xx-small	compilation	None
hf_densenet121.ra_in1k	Numerics	None
hf_detr-doc-table-detection	Numerics	None
hf_detr-layout-detection	compilation	None
hf_detr-resnet-101	Numerics	None
hf_detr-resnet-101-dc5	Numerics	None
hf_detr-resnet-50	Numerics	None
hf_detr-resnet-50-dc5	Numerics	None
hf_detr-resnet-50-finetuned-10k-cppe5	Numerics	None
hf_detr-resnet-50-panoptic	compilation	None
hf_detr-resnet-50-sku110k	Numerics	None
hf_diagram_detr_r50_finetuned	Numerics	None
hf_distilhubert	compilation	None
hf_ditr-e15	Numerics	None
hf_dpt-large-ade	compilation	None
hf_ese_vovnet19b_dw.ra_in1k	Numerics	None
hf_eva_large_patch14_196.in22k_ft_in22k_in1k	native_inference	None
hf_fine-tuned-MoritzLaurer-deberta-v3-large-zeroshot-v2.0-arceasy	setup	None
hf_inception_resnet_v2.tf_in1k	Numerics	None
hf_inception_v3.tf_adv_in1k	Numerics	None
hf_inception_v3.tv_in1k	Numerics	None
hf_Llama3-8B-1.58-100B-tokens-GGUF	setup	None
hf_llm-mdeberta-v3-swag	compilation	None
hf_mdeberta-v3-base	compilation	None
hf_mDeBERTa-v3-base-mnli-xnli	compilation	None
hf_mdeberta-v3-base-squad2	compilation	None
hf_mDeBERTa-v3-xnli-ft-bs-multiple-choice	compilation	None
hf_Medical-NER	compilation	None
hf_Meta-Llama-3.1-8B-Instruct-AWQ-INT4	setup	None
hf_Meta-Llama-3.1-8B-Instruct-bnb-4bit	setup	None
hf_Midnight-Miqu-70B-v1.5-4bit	setup	None
hf_Mistral-7B-Instruct-v0.2-GPTQ	setup	None
hf_mobilenet_v1_0.75_192	native_inference	None
hf_mobilevit-small	compilation	None
hf_mxbai-rerank-base-v1	compilation	None
hf_mxbai-rerank-xsmall-v1	compilation	None
hf_nfnet_l0.ra2_in1k	import_model	None
hf_nli-deberta-v3-base	compilation	None
hf_output	compilation	None
hf_pedestrian_gender_recognition	compilation	None
hf_phobert-base-finetuned	compiled_inference	None
hf_phobert-large-finetuned	compiled_inference	None
hf_piiranha-v1-detect-personal-information	compilation	None
hf_pix2text-table-rec	Numerics	None
hf_pnasnet5large.tf_in1k	compilation	None
hf_Qwen2.5-1.5B-Instruct	Numerics	None
hf_resnet-18	Numerics	None
hf_resnet-50	Numerics	None
hf_resnet101.a1h_in1k	Numerics	None
hf_resnet18.a1_in1k	Numerics	None
hf_resnet34.a1_in1k	Numerics	None
hf_resnet50.a1_in1k	Numerics	None
hf_resnext50_32x4d.fb_swsl_ig1b_ft_in1k	Numerics	None
hf_ruRoPEBert-e5-base-2k	setup	None
hf_splinter-base	compilation	None
hf_splinter-base-qass	compilation	None
hf_swin-tiny-patch4-window7-224	compilation	None
hf_swin_base_patch4_window7_224.ms_in22k_ft_in1k	compilation	None
hf_table-transformer-detection	compilation	None
hf_table-transformer-detection-custom-ale	compilation	None
hf_table-transformer-structure-recognition	Numerics	None
hf_table-transformer-structure-recognition-v1.1-all	Numerics	None
hf_table-transformer-structure-recognition-v1.1-pub	Numerics	None
hf_tf_efficientnet_b0.ns_jft_in1k	Numerics	None
hf_tf_efficientnetv2_s.in21k	Numerics	None
hf_tf_mobilenetv3_large_minimal_100.in1k	Numerics	None
hf_tf_mobilenetv3_small_minimal_100.in1k	Numerics	None
hf_vgg16.tv_in1k	Numerics	None
hf_vgg19.tv_in1k	Numerics	None
hf_vit_base_patch32_224.augreg_in21k_ft_in1k	compilation	None
hf_wavlm-base-plus	Numerics	None
hf_wide_resnet50_2.racm_in1k	Numerics	None
hf_xcit_tiny_24_p8_384.fb_dist_in1k	Numerics	None
hf_yolos-base	compilation	None
hf_yolos-fashionpedia	compilation	None
hf_yolos-small	compilation	None
hf_yolos-small-finetuned-license-plate-detection	compilation	None
hf_yolos-small-rego-plates-detection	compilation	None

pdhirajkumarprasad mentioned this issue Jan 9, 2025

[Tracker] All the issue related with e2e shark test suite #812

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HF model tracker #899

HF model tracker #899

pdhirajkumarprasad commented Jan 9, 2025 •

edited

Loading

amd-vivekag commented Feb 13, 2025 •

edited

Loading

zjgarvey commented Feb 13, 2025 •

edited

Loading

amd-vivekag commented Feb 13, 2025

pdhirajkumarprasad commented Feb 27, 2025

vmnit commented Mar 25, 2025 •

edited

Loading

amd-vivekag commented Mar 25, 2025

HF model tracker #899

HF model tracker #899

Comments

pdhirajkumarprasad commented Jan 9, 2025 • edited Loading

amd-vivekag commented Feb 13, 2025 • edited Loading

Passing Summary

Fail Summary

zjgarvey commented Feb 13, 2025 • edited Loading

amd-vivekag commented Feb 13, 2025

pdhirajkumarprasad commented Feb 27, 2025

vmnit commented Mar 25, 2025 • edited Loading

amd-vivekag commented Mar 25, 2025

Passing Summary

Fail Summary

Test Run Detail

pdhirajkumarprasad commented Jan 9, 2025 •

edited

Loading

amd-vivekag commented Feb 13, 2025 •

edited

Loading

zjgarvey commented Feb 13, 2025 •

edited

Loading

vmnit commented Mar 25, 2025 •

edited

Loading