-
Notifications
You must be signed in to change notification settings - Fork 120
Open-source ONNX Runtime EP for XDNA 2 NPU without VAIP — benchmarks + source #356
Copy link
Copy link
Open
Description
iron-ep: First open-source VitisAI Execution Provider
Repo: https://github.com/Manuelreyesbravo/iron-ep
I built a fully open-source replacement for AMD's proprietary libonnxruntime_vitisai_ep.so that runs directly on the XDNA 2 NPU using IRON + MLIR-AIE + Peano + XRT — no VAIP, no closed-source runtime.
What it does
Implements the compile_onnx_model_vitisai_ep_v4 VitisAI EP interface so ONNX Runtime loads it transparently. Claims and executes MatMul, MatMulInteger, and Gemm nodes on the NPU.
Benchmark results (AMD Ryzen AI 9 HX 375, Fedora 43)
INT8 MatMulInteger — whole_array backend (16 AIE cores):
| Shape | NPU (ms) | CPU (ms) | Speedup |
|---|---|---|---|
| 256×1024×1024 | 1.041 | 3.082 | 2.96× |
| 512×2048×2048 | 5.553 | 16.081 | 2.90× |
| 1024×4096×4096 | 35.836 | 119.240 | 3.33× |
1024×4096×4096 = full hidden-dim matmul of a 7B LLM at batch 1024. Results are bit-exact vs CPU int32 reference.
How it works
- Graph analyzer claims MatMul/MatMulInteger/Gemm nodes, reads dtype via
node_arg_get_element_type(slot 43 in the VitisAI API) - Kernel cache drives MLIR-AIE: generates MLIR → Peano compiles mm.cc → aiecc links xclbin
- XRT runner dispatches to hardware
- Adaptive backend:
single_corefor M<256,whole_array(4×4=16 cores) for M≥256
Why I'm posting here
Would love feedback on:
- Whether the VitisAI EP slot API (
node_arg_get_element_type,graph_nodes_unsafe, etc.) is stable across versions, or if there's a more stable interface - If there's interest in upstreaming this or collaborating on a community-maintained open EP
- Any upcoming XDNA 2 features (quantized attention, Flash Attention on NPU) that could inform the roadmap
Happy to provide more details or run specific benchmarks on request.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels