You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running on windows ARM64 with a Snapdragon(R) X Elite Z1E80100 NPU, using onnxruntime-qnn. The model runs on CPU in 9 seconds, however on NPU it hangs at inference time and I don't get results. I get no warnings while creating the session about operations falling back to CPU. However, I do get multiple different graphs being created, with logging set to verbose I can see "Completed stage: Graph preparation" and "Completed stage: Graph Transformations and optimizations" multiple times in the output. Please let me know how to debug further.
Describe the issue
I'm running on windows ARM64 with a Snapdragon(R) X Elite Z1E80100 NPU, using onnxruntime-qnn. The model runs on CPU in 9 seconds, however on NPU it hangs at inference time and I don't get results. I get no warnings while creating the session about operations falling back to CPU. However, I do get multiple different graphs being created, with logging set to verbose I can see "Completed stage: Graph preparation" and "Completed stage: Graph Transformations and optimizations" multiple times in the output. Please let me know how to debug further.
The final output with verbose logging is here:
To reproduce
I cannot provide the model, I include my python inference script here:
Urgency
The issue is relatively urgent as the deadline is approaching.
Platform
Windows
OS Version
11
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime-qnn 1.21.0
ONNX Runtime API
Python
Architecture
ARM64
Execution Provider
Other / Unknown
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: