-
Notifications
You must be signed in to change notification settings - Fork 3.8k
MatMulInteger: per-row a_zero_point of shape [M] rejected despite ONNX spec allowing it #27897
Description
Describe the issue
Summary
ORT's MatMulInteger unconditionally rejects a_zero_point tensors with shape [M] (per-row quantization), even though the ONNX spec explicitly allows it.
ONNX Spec Reference
From the MatMulInteger spec
a_zero_point: Zero point tensor for input 'A'. It's optional and default value is 0. It could be a scalar or N-D tensor. Scalar refers to per tensor quantization whereas N-D refers to per row quantization. If the input is 2D of shape [M, > K] then zero point tensor may be an M element vector [zp_1, zp_2, …, zp_M].
Context
I discovered this while implementing MatMulInteger support in the TVM Relax ONNX frontend (apache/tvm#18951). During testing, I found that ORT rejects valid ONNX spec inputs for a_zero_point, which prevented me from using ORT as a reference for per-row zero-point test cases.
Reproduction
import numpy as np
import onnx
import onnxruntime
from onnx import TensorProto, helper
A = np.array([[10, 10], [20, 20]], dtype=np.uint8)
B = np.array([[1, 2], [3, 4]], dtype=np.int8)
a_zp = np.array([10, 20], dtype=np.uint8) # shape [M=2], per-row
A_info = helper.make_tensor_value_info("A", TensorProto.UINT8, [2, 2])
B_info = helper.make_tensor_value_info("B", TensorProto.INT8, [2, 2])
out = helper.make_tensor_value_info("output", TensorProto.INT32, None)
zp_init = helper.make_tensor("a_zero_point", TensorProto.UINT8, [2], a_zp.tolist())
node = helper.make_node("MatMulInteger", ["A", "B", "a_zero_point"], ["output"])
graph = helper.make_graph([node], "test", [A_info, B_info], [out], initializer=[zp_init])
model = helper.make_model(graph, opset_imports=[helper.make_opsetid("", 10)])
session = onnxruntime.InferenceSession(model.SerializeToString())
session.run([], {"A": A, "B": B}) # raisesError
[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION :
matmul_integer.cc:63 IsScalarOr1ElementVector(a_zero_point) was false.
MatmulInteger : input1 zero point must be a scalar or 1D tensor of size 1
Expected
Should compute (A - a_zero_point[:, None]) @ B per the ONNX spec.
Urgency
It's a spec compliance bug, but not a crash or security vulnerability. The library still works, it just doesn't implement the full spec for this one operator input shape.
Platform
Linux
OS Version
Ubuntu 24.04
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.24.4
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA, Default CPU
Execution Provider Library Version
CUDA 12.0