You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
V8 now supports AVX-VNNI instructions. The i32x4.dot_i8x16_i7x16_adds can be compiled to vpdpbusd on x64 devices, which increase the speed of applications using this opcode.
XNNPACK already has QC8/QS8 GEMM/IGEMM microkernels using relaxed simd dot products. But they are limited to certain implementation of i32x4.dot_i8x16_i7x16_adds (CheckWAsmSDOT). We would also need microkernels for VNNI-style i32x4.dot_i8x16_i7x16_adds. Our performance test using vpdpbusd on end2end_bench with a PoC show large improvement in following cases.
d8/end2end_bench
Reduction on execution time%
QC8MobileNetV1/T:1/real_time
-45.60%
QC8MobileNetV2/T:1/real_time
-30.50%
QS8MobileNetV1/T:1/real_time
-45.40%
QS8MobileNetV2/T:1/real_time
-30.30%
Does XNNPACK have plan on adding new microkernels for VNNI implementation of Wasm relaxed integer dot product? We can provide patch if needed.
The text was updated successfully, but these errors were encountered:
V8 now supports AVX-VNNI instructions. The i32x4.dot_i8x16_i7x16_adds can be compiled to vpdpbusd on x64 devices, which increase the speed of applications using this opcode.
XNNPACK already has QC8/QS8 GEMM/IGEMM microkernels using relaxed simd dot products. But they are limited to certain implementation of i32x4.dot_i8x16_i7x16_adds (CheckWAsmSDOT). We would also need microkernels for VNNI-style i32x4.dot_i8x16_i7x16_adds. Our performance test using vpdpbusd on end2end_bench with a PoC show large improvement in following cases.
Does XNNPACK have plan on adding new microkernels for VNNI implementation of Wasm relaxed integer dot product? We can provide patch if needed.
The text was updated successfully, but these errors were encountered: