Description
Documentation link
https://github.com/openvinotoolkit/openvino/blob/master/src/plugins/intel_npu/README.md
Description
I have some questions about the high-level architecture diagram in the README.md that shows the OpenVINO NPU design:
- Why doesn't the compilation part on the left side of the architecture diagram call Level Zero interfaces? Can the compiled model be executed directly by the NPU driver?
I think it should use Level Zero interfaces to load pre-compiled models, similar to the execution part on the right side.
- Do the left and right sides of the architecture diagram represent compilation and execution steps respectively?
In reality, compilation and execution sometimes operate sequentially. OpenVINO NPU can load OpenVINO IR models, compile them and pass them to the NPU driver for execution, or it can directly load pre-compiled blob models. I noticed that Level Zero's ze_graph can load pre-compiled models - is this one of the messages that the architecture diagram is trying to convey?
Based on the code provided, we can see that ze_graph supports loading pre-compiled models through the ZE_GRAPH_FORMAT_NATIVE
format:
typedef enum _ze_graph_format_t
{
ZE_GRAPH_FORMAT_NATIVE = 0x1, ///< Format is pre-compiled blob (elf, flatbuffers)
ZE_GRAPH_FORMAT_NGRAPH_LITE = 0x2, ///< Format is ngraph lite IR
} ze_graph_format_t;
And the graph descriptor allows loading both pre-compiled blobs and IR models:
typedef struct _ze_graph_desc_t
{
ze_structure_type_graph_ext_t stype; ///< [in] type of this structure
void* pNext; ///< [in,out][optional] must be null or a pointer to an extension-specific
ze_graph_format_t format; ///< [in] Graph format passed in with input
size_t inputSize; ///< [in] Size of input buffer in bytes
const uint8_t* pInput; ///< [in] Pointer to input buffer
const char* pBuildFlags; ///< [in][optional] Null terminated string containing build flags. Options:
///< - '--inputs_precisions="<arg>:<precision> <arg2>:<precision> ..."'
///< '--outputs_precisions="<arg>:<precision> <arg2>:<precision> ..."'
///< - Set input and output arguments precision. Supported precisions:
///< FP64, FP32, FP16, BF16, U64, U32, U16, U8, U4, I64, I32, I16, I8, I4, BIN
///< - '--inputs_layouts="<arg>:<layout> <arg2>:<layout> ..."'
///< '--outputs_layouts="<arg>:<layout> <arg2>:<layout> ..."'
///< - Set input and output arguments layout. Supported layouts:
///< NCHW, NHWC, NCDHW, NDHWC, OIHW, C, CHW, HW, NC, CN
///< - '--config PARAM="VALUE" PARAM2="VALUE" ...'
///< - compile options string passed directly to compiler
} ze_graph_desc_t;
This suggests that Level Zero provides interfaces for both compilation and execution phases, though the architecture diagram may be simplifying the relationship between these components.
Issue submission checklist
- I'm reporting a documentation issue. It's not a question.