Skip to content

Open-source ONNX Runtime EP for XDNA 2 NPU without VAIP — benchmarks + source #356

@Manuelreyesbravo

Description

@Manuelreyesbravo

iron-ep: First open-source VitisAI Execution Provider

Repo: https://github.com/Manuelreyesbravo/iron-ep

I built a fully open-source replacement for AMD's proprietary libonnxruntime_vitisai_ep.so that runs directly on the XDNA 2 NPU using IRON + MLIR-AIE + Peano + XRT — no VAIP, no closed-source runtime.

What it does

Implements the compile_onnx_model_vitisai_ep_v4 VitisAI EP interface so ONNX Runtime loads it transparently. Claims and executes MatMul, MatMulInteger, and Gemm nodes on the NPU.

Benchmark results (AMD Ryzen AI 9 HX 375, Fedora 43)

INT8 MatMulInteger — whole_array backend (16 AIE cores):

Shape NPU (ms) CPU (ms) Speedup
256×1024×1024 1.041 3.082 2.96×
512×2048×2048 5.553 16.081 2.90×
1024×4096×4096 35.836 119.240 3.33×

1024×4096×4096 = full hidden-dim matmul of a 7B LLM at batch 1024. Results are bit-exact vs CPU int32 reference.

How it works

  1. Graph analyzer claims MatMul/MatMulInteger/Gemm nodes, reads dtype via node_arg_get_element_type (slot 43 in the VitisAI API)
  2. Kernel cache drives MLIR-AIE: generates MLIR → Peano compiles mm.cc → aiecc links xclbin
  3. XRT runner dispatches to hardware
  4. Adaptive backend: single_core for M<256, whole_array (4×4=16 cores) for M≥256

Why I'm posting here

Would love feedback on:

  • Whether the VitisAI EP slot API (node_arg_get_element_type, graph_nodes_unsafe, etc.) is stable across versions, or if there's a more stable interface
  • If there's interest in upstreaming this or collaborating on a community-maintained open EP
  • Any upcoming XDNA 2 features (quantized attention, Flash Attention on NPU) that could inform the roadmap

Happy to provide more details or run specific benchmarks on request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions