Enable QC8/QS8 GEMM/IGEMM for Wasm relaxed integer dot product instruction on x64 #6454

fanchenkong1 · 2024-05-22T08:18:34Z

V8 now supports AVX-VNNI instructions. The i32x4.dot_i8x16_i7x16_adds can be compiled to vpdpbusd on x64 devices, which increase the speed of applications using this opcode.

XNNPACK already has QC8/QS8 GEMM/IGEMM microkernels using relaxed simd dot products. But they are limited to certain implementation of i32x4.dot_i8x16_i7x16_adds (CheckWAsmSDOT). We would also need microkernels for VNNI-style i32x4.dot_i8x16_i7x16_adds. Our performance test using vpdpbusd on end2end_bench with a PoC show large improvement in following cases.

d8/end2end_bench	Reduction on execution time%
QC8MobileNetV1/T:1/real_time	-45.60%
QC8MobileNetV2/T:1/real_time	-30.50%
QS8MobileNetV1/T:1/real_time	-45.40%
QS8MobileNetV2/T:1/real_time	-30.30%

Does XNNPACK have plan on adding new microkernels for VNNI implementation of Wasm relaxed integer dot product? We can provide patch if needed.

fanchenkong1 mentioned this issue May 30, 2024

Add QS8_QC8W GEMM/IGEMM microkernels for Wasm Relaxed Unsigned and Signed … #6505

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable QC8/QS8 GEMM/IGEMM for Wasm relaxed integer dot product instruction on x64 #6454

Enable QC8/QS8 GEMM/IGEMM for Wasm relaxed integer dot product instruction on x64 #6454

fanchenkong1 commented May 22, 2024

Enable QC8/QS8 GEMM/IGEMM for Wasm relaxed integer dot product instruction on x64 #6454

Enable QC8/QS8 GEMM/IGEMM for Wasm relaxed integer dot product instruction on x64 #6454

Comments

fanchenkong1 commented May 22, 2024