Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QNN as ONNXruntime backend hangs while executing graph #24166

Open
ktadgh opened this issue Mar 25, 2025 · 1 comment
Open

QNN as ONNXruntime backend hangs while executing graph #24166

ktadgh opened this issue Mar 25, 2025 · 1 comment
Labels
ep:QNN issues related to QNN exeution provider

Comments

@ktadgh
Copy link

ktadgh commented Mar 25, 2025

Describe the issue

I'm running on windows ARM64 with a Snapdragon(R) X Elite Z1E80100 NPU, using onnxruntime-qnn. The model runs on CPU in 9 seconds, however on NPU it hangs at inference time and I don't get results. I get no warnings while creating the session about operations falling back to CPU. However, I do get multiple different graphs being created, with logging set to verbose I can see "Completed stage: Graph preparation" and "Completed stage: Graph Transformations and optimizations" multiple times in the output. Please let me know how to debug further.

The final output with verbose logging is here:


2025-03-25 11:26:00.7450828 [I:onnxruntime:, graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply] GraphTransformer MemcpyTransformer modified: 0 with status: OK 2025-03-25 11:26:00.7481471 

[V:onnxruntime:, session_state.cc:1148 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Node placements 2025-03-25 11:26:00.7516884 [V:onnxruntime:, session_state.cc:1151 onnxruntime::VerifyEachNodeIsAssignedToAnEp] All nodes placed on [QNNExecutionProvider]. Number of nodes: 1 2025-03-25 11:26:00.7589148 

[V:onnxruntime:, session_state.cc:128 onnxruntime::SessionState::CreateGraphInfo] SaveMLValueNameIndexMapping 2025-03-25 11:26:00.7638620 [V:onnxruntime:, session_state.cc:174 onnxruntime::SessionState::CreateGraphInfo] Done saving OrtValue mappings. 2025-03-25 11:26:00.7678138 [I:onnxruntime:, allocation_planner.cc:2567 

onnxruntime::IGraphPartitioner::CreateGraphPartitioner] Use DeviceBasedPartition as default 2025-03-25 11:26:00.7819933 [I:onnxruntime:, session_state_utils.cc:276 onnxruntime::session_state_utils::SaveInitializedTensors] Saving initialized tensors. 2025-03-25 11:26:00.8575228 [I:onnxruntime:, session_state_utils.cc:427 

onnxruntime::session_state_utils::SaveInitializedTensors] Done saving initialized tensors 2025-03-25 11:26:00.8622200 [V:onnxruntime:, qnn_execution_provider.cc:778 

onnxruntime::QNNExecutionProvider::CreateComputeFunc::<lambda_fd0a0b74617d3dc5e2747221c3e6ca82>::operator ()] compute_info.create_state_func context->node_name: QNNExecutionProvider_QNN_2838676038692915326_1_0 2025-03-25 11:26:00.8788589 [I:onnxruntime:, inference_session.cc:2106 

onnxruntime::InferenceSession::Initialize] Session successfully initialized. Session created successfully. 

created 
<onnxruntime.capi.onnxruntime_inference_collection.InferenceSession object at 0x000001AFAEBAEA10> beginning 2025-03-25 11:26:00.9468939 [V:onnxruntime:, qnn_model.cc:187 onnxruntime::qnn::QnnModel::ExecuteGraph]

 QnnModel::ExecuteGraphs 2025-03-25 11:26:00.9499969 [V:onnxruntime:, qnn_model.cc:206 

onnxruntime::qnn::QnnModel::ExecuteGraph] model_input = input index = 0 2025-03-25 11:26:00.9536475 [V:onnxruntime:, qnn_model.cc:210 onnxruntime::qnn::QnnModel::ExecuteGraph] Qnn tensor size: 54525952Ort tensor size: 54525952 2025-03-25 11:26:00.9577749 [V:onnxruntime:, qnn_model.cc:225 

onnxruntime::qnn::QnnModel::ExecuteGraph] model_output = output index = 0 2025-03-25 11:26:00.9618491 [V:onnxruntime:, qnn_model.cc:230 onnxruntime::qnn::QnnModel::ExecuteGraph] Qnn tensor size: 12582912Ort tensor size: 12582912 2025-03-25 11:26:00.9645911 [V:onnxruntime:, qnn_model.cc:240 

onnxruntime::qnn::QnnModel::ExecuteGraph] Start execute QNN graph:QNNExecutionProvider_QNN_2838676038692915326_1_0

To reproduce

I cannot provide the model, I include my python inference script here:

import numpy as np
import onnxruntime as ort
import random
import time
import onnx
import numpy as np
import os
os.environ['ORT_LOGGING_LEVEL'] = 'VERBOSE'

img = np.random.rand(1,3,1024,1024)

providers = [
    'QNNExecutionProvider']
print(ort.get_available_providers())

sess_options = ort.SessionOptions()
sess_options.enable_profiling=False

sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
# sess_options.add_session_config_entry("session.disable_cpu_ep_fallback","1")
# sess_options.add_session_config_entry("session.htp_performance_mode","high_performance")
# sess_options.add_session_config_entry("session.profiling_level","detailed")
# sess_options.add_session_config_entry("session.profiling_file_path","C:/Users/tadgh/onnx-workshop/profile.csv")
sess_options.log_severity_level = 0
print("creating session")

onnx_model = onnx.load("C:/Users/tadgh/onnx-workshop/transformer_new_qdq_int8.onnx")
onnx.checker.check_model(onnx_model)
print("Model is valid.")


session = ort.InferenceSession("C:/Users/tadgh/onnx-workshop/transformer_new_qdq_int8.onnx", sess_options, providers=providers, provider_options = [{'disable_fallback':'false','backend_path':'QnnHtp.dll', 'htp_arch':'73', 'enable_htp_fp16_precision':'1','enable_htp_shared_memory_allocator':'0'}])
print('created', session)
# Run inference with ONNX Runtime
input_data = (img).astype(np.float32)
ort_inputs = {session.get_inputs()[0].name: input_data}

times = []
for _ in range(10):
    try:
        print('beginning')
        start = time.time()
        onnx_output = session.run(None, ort_inputs)
        end = time.time()
        times.append(end-start)
        print(f'completed in {end-start:.2f} seconds')
        print('Output shape:', [o.shape for o in onnx_output])
    except Exception as e:
        print(f"Error during inference: {e}")



times = np.array(times[3:])
print(session.end_profiling())
print(f'Average time elapsed: {times.mean()} \n std. dev.: {times.std()}')

Urgency

The issue is relatively urgent as the deadline is approaching.

Platform

Windows

OS Version

11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnxruntime-qnn 1.21.0

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

Other / Unknown

Execution Provider Library Version

No response

@github-actions github-actions bot added the ep:QNN issues related to QNN exeution provider label Mar 25, 2025
@jywu-msft
Copy link
Member

it's difficult to help debug without a repro case.
do you encounter this on any model that is shareable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:QNN issues related to QNN exeution provider
Projects
None yet
Development

No branches or pull requests

2 participants