QNN as ONNXruntime backend hangs while executing graph #24166

ktadgh · 2025-03-25T12:17:32Z

Describe the issue

I'm running on windows ARM64 with a Snapdragon(R) X Elite Z1E80100 NPU, using onnxruntime-qnn. The model runs on CPU in 9 seconds, however on NPU it hangs at inference time and I don't get results. I get no warnings while creating the session about operations falling back to CPU. However, I do get multiple different graphs being created, with logging set to verbose I can see "Completed stage: Graph preparation" and "Completed stage: Graph Transformations and optimizations" multiple times in the output. Please let me know how to debug further.

The final output with verbose logging is here:


2025-03-25 11:26:00.7450828 [I:onnxruntime:, graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply] GraphTransformer MemcpyTransformer modified: 0 with status: OK 2025-03-25 11:26:00.7481471 

[V:onnxruntime:, session_state.cc:1148 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Node placements 2025-03-25 11:26:00.7516884 [V:onnxruntime:, session_state.cc:1151 onnxruntime::VerifyEachNodeIsAssignedToAnEp] All nodes placed on [QNNExecutionProvider]. Number of nodes: 1 2025-03-25 11:26:00.7589148 

[V:onnxruntime:, session_state.cc:128 onnxruntime::SessionState::CreateGraphInfo] SaveMLValueNameIndexMapping 2025-03-25 11:26:00.7638620 [V:onnxruntime:, session_state.cc:174 onnxruntime::SessionState::CreateGraphInfo] Done saving OrtValue mappings. 2025-03-25 11:26:00.7678138 [I:onnxruntime:, allocation_planner.cc:2567 

onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default 2025-03-25 11:26:00.7819933 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors. 2025-03-25 11:26:00.8575228 [I:onnxruntime:, session_state_utils.cc:427 

onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors 2025-03-25 11:26:00.8622200 [V:onnxruntime:, qnn_execution_provider.cc:778 

onnxruntime::QNNExecutionProvider::CreateComputeFunc::<lambda_fd0a0b74617d3dc5e2747221c3e6ca82>::operator ()] compute_info.create_state_func context->node_name: QNNExecutionProvider_QNN_2838676038692915326_1_0 2025-03-25 11:26:00.8788589 [I:onnxruntime:, inference_session.cc:2106 

onnxruntime::InferenceSession::Initialize] Session successfully initialized. Session created successfully. 

created 
<onnxruntime.capi.onnxruntime_inference_collection.InferenceSession object at 0x000001AFAEBAEA10> beginning 2025-03-25 11:26:00.9468939 [V:onnxruntime:, qnn_model.cc:187 onnxruntime::qnn::QnnModel::ExecuteGraph]

 QnnModel::ExecuteGraphs 2025-03-25 11:26:00.9499969 [V:onnxruntime:, qnn_model.cc:206 

onnxruntime::qnn::QnnModel::ExecuteGraph] model_input = input index = 0 2025-03-25 11:26:00.9536475 [V:onnxruntime:, qnn_model.cc:210 onnxruntime::qnn::QnnModel::ExecuteGraph] Qnn tensor size: 54525952Ort tensor size: 54525952 2025-03-25 11:26:00.9577749 [V:onnxruntime:, qnn_model.cc:225 

onnxruntime::qnn::QnnModel::ExecuteGraph] model_output = output index = 0 2025-03-25 11:26:00.9618491 [V:onnxruntime:, qnn_model.cc:230 onnxruntime::qnn::QnnModel::ExecuteGraph] Qnn tensor size: 12582912Ort tensor size: 12582912 2025-03-25 11:26:00.9645911 [V:onnxruntime:, qnn_model.cc:240 

onnxruntime::qnn::QnnModel::ExecuteGraph] Start execute QNN graph:QNNExecutionProvider_QNN_2838676038692915326_1_0

To reproduce

I cannot provide the model, I include my python inference script here:

import numpy as np
import onnxruntime as ort
import random
import time
import onnx
import numpy as np
import os
os.environ['ORT_LOGGING_LEVEL'] = 'VERBOSE'

img = np.random.rand(1,3,1024,1024)

providers = [
    'QNNExecutionProvider']
print(ort.get_available_providers())

sess_options = ort.SessionOptions()
sess_options.enable_profiling=False

sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
# sess_options.add_session_config_entry("session.disable_cpu_ep_fallback","1")
# sess_options.add_session_config_entry("session.htp_performance_mode","high_performance")
# sess_options.add_session_config_entry("session.profiling_level","detailed")
# sess_options.add_session_config_entry("session.profiling_file_path","C:/Users/tadgh/onnx-workshop/profile.csv")
sess_options.log_severity_level = 0
print("creating session")

onnx_model = onnx.load("C:/Users/tadgh/onnx-workshop/transformer_new_qdq_int8.onnx")
onnx.checker.check_model(onnx_model)
print("Model is valid.")


session = ort.InferenceSession("C:/Users/tadgh/onnx-workshop/transformer_new_qdq_int8.onnx", sess_options, providers=providers, provider_options = [{'disable_fallback':'false','backend_path':'QnnHtp.dll', 'htp_arch':'73', 'enable_htp_fp16_precision':'1','enable_htp_shared_memory_allocator':'0'}])
print('created', session)
# Run inference with ONNX Runtime
input_data = (img).astype(np.float32)
ort_inputs = {session.get_inputs()[0].name: input_data}

times = []
for _ in range(10):
    try:
        print('beginning')
        start = time.time()
        onnx_output = session.run(None, ort_inputs)
        end = time.time()
        times.append(end-start)
        print(f'completed in {end-start:.2f} seconds')
        print('Output shape:', [o.shape for o in onnx_output])
    except Exception as e:
        print(f"Error during inference: {e}")



times = np.array(times[3:])
print(session.end_profiling())
print(f'Average time elapsed: {times.mean()} \n std. dev.: {times.std()}')

Urgency

The issue is relatively urgent as the deadline is approaching.

Platform

Windows

OS Version

11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnxruntime-qnn 1.21.0

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

Other / Unknown

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

jywu-msft · 2025-03-25T20:49:04Z

it's difficult to help debug without a repro case.
do you encounter this on any model that is shareable?

github-actions bot added the ep:QNN issues related to QNN exeution provider label Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QNN as ONNXruntime backend hangs while executing graph #24166

QNN as ONNXruntime backend hangs while executing graph #24166

ktadgh commented Mar 25, 2025

jywu-msft commented Mar 25, 2025

QNN as ONNXruntime backend hangs while executing graph #24166

QNN as ONNXruntime backend hangs while executing graph #24166

Comments

ktadgh commented Mar 25, 2025

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

jywu-msft commented Mar 25, 2025