[TorchOnnxToTorch] Add block_size support for onnx.DequantizeLinear by jtuyls · Pull Request #4505 · llvm/torch-mlir

jtuyls · 2026-03-18T08:37:39Z

Lower block-quantized DequantizeLinear (opset 21+) to reshape → cast → sub(zero_point) → mul(scale) → reshape.

sahas3

Thanks for the change @jtuyls. The changes look good to me, just some suggestions. I am not much familiar with ONNX code-path, so I'll defer approval to other reviewers.

sahas3 · 2026-03-22T21:23:45Z

+
+// Block quantization: signed int8 input (si8→f32)
+// CHECK-LABEL: @test_dequantizelinear_blocked_si8
+func.func @test_dequantizelinear_blocked_si8(%arg0: !torch.vtensor<[8,256],si8>, %arg1: !torch.vtensor<[8,4],f32>, %arg2: !torch.vtensor<[8,4],si8>) -> !torch.vtensor<[8,256],f32> attributes {torch.onnx_meta.ir_version = 10 : si64, torch.onnx_meta.opset_version = 21 : si64} {


Can some e2e tests be added as well for numerical equivalence?

I didn't find good infrastructure to test onnx ops e2e. The existing infra seems to rely on torch onnx export but that doesn't create onnx dequantize layers with block size. Adding infra for this would be out-of-scope for this PR imo and I think the current lit tests verify the conversion well.

Thanks for the clarification. Is the dequantize layers with block size produced when an ONNX model is quantized in ONNX directly?

Yes, for example, or it could come from rewrites or other export paths.

Lower block-quantized DequantizeLinear (opset 21+) to reshape → cast → sub(zero_point) → mul(scale) → reshape. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jtuyls requested review from rsuderman, sahas3 and zjgarvey and removed request for sahas3 March 18, 2026 08:37

sahas3 reviewed Mar 22, 2026

View reviewed changes

[TorchOnnxToTorch] Add block_size support for onnx.DequantizeLinear

738421f

Lower block-quantized DequantizeLinear (opset 21+) to reshape → cast → sub(zero_point) → mul(scale) → reshape. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jtuyls force-pushed the block-dequantize-linear branch from 8fa2d5b to 738421f Compare March 23, 2026 10:13

jtuyls mentioned this pull request Mar 23, 2026

Qwen MoE BringUp iree-org/onnxruntime-ep-iree#34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TorchOnnxToTorch] Add block_size support for onnx.DequantizeLinear#4505

[TorchOnnxToTorch] Add block_size support for onnx.DequantizeLinear#4505
jtuyls wants to merge 1 commit intollvm:mainfrom
jtuyls:block-dequantize-linear

jtuyls commented Mar 18, 2026

Uh oh!

sahas3 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sahas3 Mar 22, 2026

Uh oh!

jtuyls Mar 23, 2026

Uh oh!

sahas3 Mar 23, 2026

Uh oh!

jtuyls Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jtuyls commented Mar 18, 2026

Uh oh!

sahas3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sahas3 Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

jtuyls Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

sahas3 Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

jtuyls Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants