-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs]: NPU Plugin high level design diagram #27512
Comments
@pereanub ,could you please comment? |
Hello! You are correct, the CompilerAdapter also uses level-zero API and level-zero graph extension API to interact with the driver: As you also found, CompilerAdapter would use pfnCreate2 with ZE_GRAPH_FORMAT_NGRAPH_LITE when compiling a model, and pfnCreate2 with the ZE_GRAPH_FORMAT_NATIVE when importing a precompiled model. The confusion in the diagram is caused by the name of our backend (LevelZero). This is the plugin component that binds an OpenVINO infer request to level-zero primitives like command queue and command lists and executes the model on the device using these primitives. Historically, NPU plugin supported multiple backends. Among all the others, the one capable of interacting with a level-zero driver was called "LevelZero". Since we currently support only level-zero drivers we could simplify these naming in the future and update the diagram as well. We will try to avoid such confusions in the future. Thank you for your feedback! |
@PatrikStepan Thanks so much for your reply. This means that if I run blob format files directly with OpenVINO + NPU plugin, such as blob file in Intel/sd-1.5-controlnet-scribble-quantized, it can run directly. If I'm using OpenVINO IR model files, then the NPU compiler needs to perform serialization and deserialization operations. Is this interpretation correct? Additionally, I have two questions to verify with you:
|
Yes, your interpretation is correct.
Are the prebuild ELF files in the NPU plugin open source? It seems to contain some non-linear operators. The blob file generated by the NPU driver includes the prebuilt kernels used by that model only. |
@PatrikStepan Clear explanation. You mean the ELF file will definitely be included in the blob file generated by the compiler. But as a SW_kernel file, the ELF file can only be pre-built into the compiler by the npu compiler developers, right? If I, as a user or third-party developer, need to add a new SW_kernel to the npu compiler, is there a way to do this? |
Documentation link
https://github.com/openvinotoolkit/openvino/blob/master/src/plugins/intel_npu/README.md
Description
I have some questions about the high-level architecture diagram in the README.md that shows the OpenVINO NPU design:
I think it should use Level Zero interfaces to load pre-compiled models, similar to the execution part on the right side.
In reality, compilation and execution sometimes operate sequentially. OpenVINO NPU can load OpenVINO IR models, compile them and pass them to the NPU driver for execution, or it can directly load pre-compiled blob models. I noticed that Level Zero's ze_graph can load pre-compiled models - is this one of the messages that the architecture diagram is trying to convey?
Based on the code provided, we can see that ze_graph supports loading pre-compiled models through the
ZE_GRAPH_FORMAT_NATIVE
format:And the graph descriptor allows loading both pre-compiled blobs and IR models:
This suggests that Level Zero provides interfaces for both compilation and execution phases, though the architecture diagram may be simplifying the relationship between these components.
Issue submission checklist
The text was updated successfully, but these errors were encountered: