MatMulInteger: per-row a_zero_point of shape [M] rejected despite ONNX spec allowing it

### Describe the issue

## Summary
ORT's MatMulInteger unconditionally rejects `a_zero_point` tensors with shape `[M]` (per-row quantization), even though the ONNX spec explicitly allows it.

### ONNX Spec Reference
From the [MatMulInteger spec](https://onnx.ai/onnx/operators/onnx__MatMulInteger.html)
> a_zero_point: Zero point tensor for input 'A'. It's optional and default value is 0. It could be a scalar or N-D tensor. Scalar refers to per tensor quantization whereas N-D refers to per row quantization. If the input is 2D of shape [M, > K] then zero point tensor may be an M element vector [zp_1, zp_2, …, zp_M].

## Context
I discovered this while implementing `MatMulInteger` support in the TVM Relax ONNX frontend (apache/tvm#18951). During testing, I found that ORT rejects valid ONNX spec inputs for `a_zero_point`, which prevented me from using ORT as a reference for per-row zero-point test cases.

## Reproduction
```python
import numpy as np
import onnx
import onnxruntime
from onnx import TensorProto, helper

A = np.array([[10, 10], [20, 20]], dtype=np.uint8)
B = np.array([[1, 2],   [3, 4]],  dtype=np.int8)
a_zp = np.array([10, 20], dtype=np.uint8)  # shape [M=2], per-row

A_info  = helper.make_tensor_value_info("A", TensorProto.UINT8, [2, 2])
B_info  = helper.make_tensor_value_info("B", TensorProto.INT8,  [2, 2])
out     = helper.make_tensor_value_info("output", TensorProto.INT32, None)
zp_init = helper.make_tensor("a_zero_point", TensorProto.UINT8, [2], a_zp.tolist())

node  = helper.make_node("MatMulInteger", ["A", "B", "a_zero_point"], ["output"])
graph = helper.make_graph([node], "test", [A_info, B_info], [out], initializer=[zp_init])
model = helper.make_model(graph, opset_imports=[helper.make_opsetid("", 10)])

session = onnxruntime.InferenceSession(model.SerializeToString())
session.run([], {"A": A, "B": B})  # raises
```

## Error
```
[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : 
matmul_integer.cc:63 IsScalarOr1ElementVector(a_zero_point) was false.
MatmulInteger : input1 zero point must be a scalar or 1D tensor of size 1
```

## Expected
Should compute `(A - a_zero_point[:, None]) @ B` per the ONNX spec.

### Urgency

It's a spec compliance bug, but not a crash or security vulnerability. The library still works, it just doesn't implement the full spec for this one operator input shape.

### Platform

Linux

### OS Version

Ubuntu 24.04

### ONNX Runtime Installation

Built from Source

### ONNX Runtime Version or Commit ID

1.24.4

### ONNX Runtime API

Python

### Architecture

X64

### Execution Provider

CUDA, Default CPU

### Execution Provider Library Version

CUDA 12.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MatMulInteger: per-row a_zero_point of shape [M] rejected despite ONNX spec allowing it #27897

Describe the issue

Summary

ONNX Spec Reference

Context

Reproduction

Error

Expected

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MatMulInteger: per-row a_zero_point of shape [M] rejected despite ONNX spec allowing it #27897

Description

Describe the issue

Summary

ONNX Spec Reference

Context

Reproduction

Error

Expected

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions