Skip to content

[Docs]: NPU Plugin high level design diagram #27512

Closed
@junruizh2021

Description

@junruizh2021

Documentation link

https://github.com/openvinotoolkit/openvino/blob/master/src/plugins/intel_npu/README.md

Description

I have some questions about the high-level architecture diagram in the README.md that shows the OpenVINO NPU design:

  • Why doesn't the compilation part on the left side of the architecture diagram call Level Zero interfaces? Can the compiled model be executed directly by the NPU driver?

I think it should use Level Zero interfaces to load pre-compiled models, similar to the execution part on the right side.

  • Do the left and right sides of the architecture diagram represent compilation and execution steps respectively?

In reality, compilation and execution sometimes operate sequentially. OpenVINO NPU can load OpenVINO IR models, compile them and pass them to the NPU driver for execution, or it can directly load pre-compiled blob models. I noticed that Level Zero's ze_graph can load pre-compiled models - is this one of the messages that the architecture diagram is trying to convey?

Based on the code provided, we can see that ze_graph supports loading pre-compiled models through the ZE_GRAPH_FORMAT_NATIVE format:

typedef enum _ze_graph_format_t
{
    ZE_GRAPH_FORMAT_NATIVE = 0x1,                   ///< Format is pre-compiled blob (elf, flatbuffers)
    ZE_GRAPH_FORMAT_NGRAPH_LITE = 0x2,              ///< Format is ngraph lite IR

} ze_graph_format_t;

And the graph descriptor allows loading both pre-compiled blobs and IR models:

typedef struct _ze_graph_desc_t
{
    ze_structure_type_graph_ext_t stype;            ///< [in] type of this structure
    void* pNext;                                    ///< [in,out][optional] must be null or a pointer to an extension-specific
    ze_graph_format_t format;                       ///< [in] Graph format passed in with input
    size_t inputSize;                               ///< [in] Size of input buffer in bytes
    const uint8_t* pInput;                          ///< [in] Pointer to input buffer
    const char* pBuildFlags;                        ///< [in][optional] Null terminated string containing build flags. Options:
                                                    ///< - '--inputs_precisions="<arg>:<precision> <arg2>:<precision> ..."'
                                                    ///<   '--outputs_precisions="<arg>:<precision> <arg2>:<precision> ..."'
                                                    ///<   - Set input and output arguments precision. Supported precisions:
                                                    ///<     FP64, FP32, FP16, BF16, U64, U32, U16, U8, U4, I64, I32, I16, I8, I4, BIN
                                                    ///< - '--inputs_layouts="<arg>:<layout> <arg2>:<layout> ..."'
                                                    ///<   '--outputs_layouts="<arg>:<layout> <arg2>:<layout> ..."'
                                                    ///<   - Set input and output arguments layout. Supported layouts:
                                                    ///<     NCHW, NHWC, NCDHW, NDHWC, OIHW, C, CHW, HW, NC, CN
                                                    ///< - '--config PARAM="VALUE" PARAM2="VALUE" ...'
                                                    ///<   - compile options string passed directly to compiler
} ze_graph_desc_t;

This suggests that Level Zero provides interfaces for both compilation and execution phases, though the architecture diagram may be simplifying the relationship between these components.

Issue submission checklist

  • I'm reporting a documentation issue. It's not a question.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions